CN111581459B

CN111581459B - Character string matching method and character string matching system

Info

Publication number: CN111581459B
Application number: CN202010538767.5A
Authority: CN
Inventors: 杨嘉佳; 唐球; 徐睿; 刘金; 张雷; 吴云峰
Original assignee: 6th Research Institute of China Electronics Corp
Current assignee: 6th Research Institute of China Electronics Corp
Priority date: 2020-06-13
Filing date: 2020-06-13
Publication date: 2021-06-15
Anticipated expiration: 2040-06-13
Also published as: CN111581459A

Abstract

The application provides a character string matching method and a character string matching system, a text to be matched is divided into a plurality of sections of character strings to be matched, meanwhile, in order to avoid missing boundary characters of each section of character string to be matched, at least one boundary character is extracted from one side, adjacent to each other, of any two adjacent sections of character strings to be matched, so that a plurality of sections of boundary character strings to be matched are obtained, and when character string matching is carried out, a target character string matched with a reference character string is determined from the plurality of sections of character strings to be matched and the plurality of sections of boundary character strings to be matched. Furthermore, when the character string matching is carried out, the completeness of all matched characters in the matching process can be guaranteed, the character string matching efficiency is effectively improved, the consumption of character string matching time is greatly reduced, and the character string matching performance is improved.

Description

Character string matching method and character string matching system

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a character string matching method and a character string matching system.

Background

The report detection technology of the hot topics can find and summarize important information and content from social media, detect the hot topics from reports of network texts, and track the evolution process of the topics in real time.

In the report detection technology of the hot topic, character string matching is a key technology. The user needs to perform character string matching every time the user detects the hot topics, when the user searches for information by keywords, the search engine searches for important information and content of social media, and if the information which is consistent with the content required by the user is found, the inquired information is returned to the user so that the user can check and select the information.

With the continuous development of social media, the reported contents of web texts are more and more, so that when hot topics are detected, the length of a character string needing to be matched is exponentially increased, the consumed time is continuously increased in the character string matching process, and the matching efficiency of the character string is continuously reduced.

Disclosure of Invention

In view of the above, an object of the present application is to provide a character string matching method and a character string matching system, which perform reference character string matching on a plurality of segments of character strings to be matched and a plurality of segments of boundary character strings to be matched respectively by dividing a text to be matched into the plurality of segments of character strings to be matched and the plurality of segments of boundary character strings to be matched. When the character strings are matched, the integrity of all the matched characters in the matching process can be ensured, the character string matching efficiency is effectively improved, the consumption of the character string matching time is greatly reduced, and the character string matching performance is improved.

In a first aspect, the present application provides a character string matching method, including:

acquiring a text to be matched and a reference character string aiming at the text to be matched;

determining a plurality of sections of character strings to be matched from the text to be matched, wherein the character length of the character strings to be matched is greater than or equal to the character length of the reference character string;

respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched, and determining a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the adjacent two sections of character strings to be matched, and the character length of each section of boundary character string to be matched is greater than or equal to that of the reference character string;

and determining a target character string matched with the reference character string from the plurality of sections of character strings to be matched and the plurality of sections of boundary character strings to be matched.

Preferably, after the target character string matched with the reference character string is determined from the plurality of segments of character strings to be matched and the plurality of segments of boundary character strings to be matched, the character string matching method further includes:

and counting the number of the target character strings matched with the reference character strings.

Preferably, the plurality of segments of the character strings to be matched are determined by:

acquiring the character length of the reference character string;

determining the division step length of the text to be matched based on the character length of the reference character string;

and based on the division step length, dividing the character strings of the text to be matched by taking the first character of the text to be matched as a starting point, and determining a plurality of sections of character strings to be matched.

Preferably, the boundary string to be matched is determined by:

determining the character length of the boundary character string to be matched;

extracting boundary characters from two adjacent segments of character strings to be matched based on the character length of the boundary character strings to be matched;

and determining the extracted boundary character as a boundary character string to be matched.

Preferably, the character length of the boundary character string to be matched is determined by the following steps:

determining the character length of boundary characters extracted from the mutually adjacent sides of any two adjacent character strings to be matched based on the character length of the reference character string;

and determining the character length of the boundary character string to be matched based on the character length of the boundary character.

Preferably, the character length of the boundary character string to be matched is determined by the following formula:

M＝2×(m-1)；

wherein M represents the character length of the boundary character string to be matched, M represents the character length of the reference character string, and M-1 represents the character length of the boundary character.

Preferably, the determining, from the plurality of segments of character strings to be matched and the plurality of segments of boundary character strings to be matched, the target character string matched with the reference character string includes:

determining initial characters of the character string to be matched and the boundary character string to be matched;

and respectively searching a target character string which is the same as the reference character string from each section of character string to be matched and each section of boundary character string to be matched by taking the determined initial character as a starting point and the character length of the reference character string as a matching step length.

In a second aspect, the present application provides a string matching system, comprising:

the acquisition module is used for acquiring a text to be matched and a reference character string aiming at the text to be matched;

the first determining module is used for determining a plurality of sections of character strings to be matched from the text to be matched, wherein the character length of the character strings to be matched is greater than or equal to the character length of the reference character string;

the second determining module is used for respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched and determining a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the two adjacent sections of character strings to be matched, and the character length of each section of boundary character string to be matched is greater than or equal to the character length of the reference character string;

and the third determining module is used for determining a target character string matched with the reference character string from the multiple segments of character strings to be matched and the multiple segments of boundary character strings to be matched.

Preferably, after the third determining module is configured to determine a target character string matching the reference character string from the multiple segments of character strings to be matched and the multiple segments of boundary character strings to be matched, the character string matching system further includes:

and the counting module is used for counting the number of the target character strings matched with the reference character strings.

Preferably, the first determining module is configured to determine a plurality of segments of character strings to be matched by:

acquiring the character length of the reference character string;

Preferably, the second determining module is configured to determine the boundary string to be matched by:

Preferably, the second determining module is further configured to determine the character length of the boundary character string to be matched by:

Preferably, the second determining module is configured to determine the character length of the boundary character string to be matched by the following formula:

M＝2×(m-1)；

Preferably, when the third determining module is configured to determine the target character string matched with the reference character string from the multiple segments of the character strings to be matched and the multiple segments of the boundary character strings to be matched, the third determining module is specifically configured to:

In a third aspect, the present application provides an electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the string matching method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the string matching method according to the first aspect.

The embodiment of the application provides a character string matching method and a character string matching system, when character string matching is carried out, a text to be matched is firstly divided into a plurality of sections of character strings to be matched, meanwhile, in order to avoid missing boundary characters of each section of character string to be matched, at least one boundary character is extracted from one side, adjacent to each other, of any two adjacent sections of character strings to be matched, so that a plurality of sections of boundary character strings to be matched are obtained, and when character string matching is carried out, a target character string matched with a reference character string is determined from the plurality of sections of character strings to be matched and the plurality of sections of boundary character strings to be matched. In this way, the text to be matched is divided into a plurality of sections of character strings to be matched and a plurality of sections of boundary character strings to be matched, and the character strings to be matched and the boundary character strings to be matched are respectively matched with the reference character strings. When the character strings are matched, the integrity of all the matched characters in the matching process can be ensured, the character string matching efficiency is effectively improved, the consumption of the character string matching time is greatly reduced, and the character string matching performance is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart of a string matching method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of another string matching method provided in the embodiments of the present application;

fig. 3 is a schematic structural diagram of a string matching system according to an embodiment of the present application;

fig. 4 is a second schematic structural diagram of a string matching system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.

In the report detection technology of the hot topic, character string matching is a key technology. The user needs to perform character string matching every time the user detects the hot topics, when the user searches for information by keywords, the search engine searches for important information and content of social media, and if the information which is consistent with the content required by the user is found, the inquired information is returned to the user so that the user can check and select the information. With the continuous development of social media, the reported contents of web texts are more and more, so that the length of a character string to be matched is exponentially increased when a hot topic is detected, the time consumed in the character string matching process is continuously increased, and the matching efficiency of the character string is reduced. Based on this, the embodiment of the application provides a character string matching method and a character string matching system, and a parallel matching processing method is adopted to perform character string matching, so that the matching efficiency of character strings is improved to a certain extent.

Referring to fig. 1, fig. 1 is a flowchart of a string matching method according to an embodiment of the present disclosure, and as shown in fig. 1, the embodiment of the present disclosure provides a string matching method, where the string matching method includes:

s110, obtaining a text to be matched and a reference character string aiming at the text to be matched.

In this step, the text to be matched can be derived from a network text of a social media, a tweet in the fields of economy, science and technology, sports and the like, a microblog tweet and the like. The reference character string is a keyword, and the keyword exists in web texts and various tweets. In the embodiment of the present application, the number and the character length of the reference character string are not specifically limited.

Specifically, the text to be matched is shown in the form of an array.

S120, determining a plurality of sections of character strings to be matched from the text to be matched, wherein the character length of the character strings to be matched is greater than or equal to the character length of the reference character string.

In the step, the text to be matched is divided into a plurality of sections of character strings to be matched, and the character length of each section of character string to be matched is larger than or equal to that of the reference character string, so that when the character strings are matched, the character strings to be matched can be ensured to include the reference character string, and the character strings matched with the reference character string can be found out from the character strings to avoid omission of the reference character string.

S130, respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched to obtain a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the adjacent two sections of character strings to be matched, and the character length of each section of boundary character string to be matched is greater than or equal to the character length of the reference character string.

It should be noted that, when the text to be matched is divided into a plurality of sections of character strings to be matched, each section of character string to be matched has boundary characters, and since the text to be matched is divided, a complete character string is separated, so that when the character strings are matched, the character strings are omitted, and an erroneous matching result occurs.

Therefore, in this step, at least one boundary character can be extracted from each of the mutually adjacent sides of two adjacent segments of the string to be matched to form a string to be matched, and the boundary characters of any two adjacent segments of the string to be matched are extracted, so as to obtain a plurality of segments of the string to be matched.

In order to ensure that the reference character string can be effectively searched in the obtained boundary character string to be matched, the character length of each section of boundary character string to be matched is greater than or equal to the character length of the reference character string.

Thus, when matching of the reference character strings is carried out, the character strings matched with the reference character strings can be found, and omission in character string matching is avoided. Wherein at least one includes one, two or more, the determination of the number is based on the character length of the reference character string.

S140, determining a target character string matched with the reference character string from the multiple sections of character strings to be matched and the multiple sections of boundary character strings to be matched.

In the step, the character string which is the same as the reference character string is searched from the multiple sections of character strings to be matched and the multiple sections of boundary character strings to be matched simultaneously, the character string which is the same as the reference character string is determined as a target character string, the speed of character string matching can be improved by adopting a parallel searching mode, furthermore, in the step, the character string matching can be simultaneously carried out by adopting a basic character string matching algorithm, a multithreading technology and the like, and the processing efficiency of a long text to be matched is greatly improved by adopting parallel pipeline processing.

The basic string matching algorithm adopted in the embodiment of the present application is not limited herein, and common basic string matching algorithms include a KMP algorithm, a BM algorithm, a finite automata algorithm, and the like.

Furthermore, in order to improve the matching performance of the character strings, the embodiment of the application adopts a parallel processing idea to segment the text to be matched, and then performs parallel processing to improve the matching performance of the character strings.

The embodiment of the application provides a character string matching method, when character string matching is carried out, a text to be matched is firstly divided into a plurality of sections of character strings to be matched, meanwhile, in order to avoid missing boundary characters of each section of character string to be matched, at least one boundary character is respectively extracted from one side, adjacent to each other, of any two adjacent sections of character strings to be matched, so that the plurality of sections of boundary character strings to be matched are obtained, and when the character string matching is carried out, a target character string matched with a reference character string is determined from the plurality of sections of character strings to be matched and the plurality of sections of boundary character strings to be matched. Therefore, when the character strings are matched, the integrity of all the matched characters in the matching process can be ensured, the character string matching efficiency is effectively improved, the consumption of the character string matching time is greatly reduced, and the character string matching performance is improved.

Referring to fig. 2, fig. 2 is a flowchart of another string matching method according to an embodiment of the present disclosure; as shown in fig. 2, the character string matching method includes:

s210, obtaining a text to be matched and a reference character string aiming at the text to be matched.

S220, determining a plurality of sections of character strings to be matched from the text to be matched, wherein the character length of the character strings to be matched is greater than or equal to the character length of the reference character string.

And S230, respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched to obtain a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the adjacent two sections of character strings to be matched, and the character length of each section of boundary character string to be matched is greater than or equal to the character length of the reference character string.

S240, determining a target character string matched with the reference character string from the multiple sections of character strings to be matched and the multiple sections of boundary character strings to be matched.

The descriptions of S110 to S140 may refer to the descriptions of S210 to S240, and the same technical effects can be achieved, which are not described in detail.

And S250, counting the number of the target character strings matched with the reference character strings.

In the step, the times of the target character strings are counted by inquiring the target character strings appearing in the multiple sections of character strings to be matched and the multiple sections of boundary character strings to be matched.

Furthermore, the embodiment of the present application performs string matching based on the following ideas: in order to obtain a correct matching result, the segment of the text to be matched and the boundary of the segment need to be segmented, so that a thought of distributed parallel processing is provided, namely, a long text is segmented and then distributed to each node for processing, and then the number of times of hitting each node is counted. For the segment boundary, special nodes are needed for matching, and then the matching result is fed back to the main node. And obtaining the total hit times according to all the matching results. Thereby realizing the parallel matching acceleration of the character strings.

In the embodiment of the present application, as a preferred embodiment, the multiple segments of character strings to be matched are determined through the following steps:

acquiring the character length of the reference character string;

in this step, the number of the obtained reference character strings may be multiple, and when the number of the reference character strings is multiple, the maximum character length of all the reference character strings is obtained; if there is one reference character string, the character length of the reference character string is only required to be obtained.

And determining the division step length of the text to be matched based on the character length of the reference character string.

In this step, the character length of the reference character string is used as a dividing step length, and the text to be matched is divided based on the dividing step length.

In the embodiment of the present application, the division step is a length of each character, and a one-dimensional array formed by character strings of a text to be matched is divided, where the division step is a length of a character of a reference character string, for example: [ a1, a2, a3 | a4, a5, a6 | … … | an-2, an-1, an ], thereby determining a plurality of strings to be matched.

In the embodiment of the present application, as a preferred embodiment, the boundary character string to be matched is determined by the following steps:

in this step, since the boundary character string to be matched is a plurality of boundary characters extracted from two adjacent segments of the boundary character string to be matched, the character length of the newly formed boundary character string to be matched is uncertain, and further the character length of the boundary character string to be matched needs to be determined.

In the step, based on the determined character length of the boundary character string to be matched, the boundary characters are extracted from two adjacent sections of the character string to be matched, so that the character length of the boundary character string to be matched, which is composed of the extracted boundary characters, is the same as the predetermined character length.

In the embodiment of the present application, as a preferred embodiment, the character length of the boundary character string to be matched is determined by the following steps:

In the step, the character length of the boundary character string to be matched is the sum of the character lengths of the boundary characters extracted from the mutually adjacent sides of the two adjacent sections of the character strings to be matched.

Specifically, the character length of the boundary character string to be matched is determined by the following formula:

M＝2×(m-1)；

According to the formula, m-1 boundary characters are extracted from the adjacent sides of any two adjacent character strings to be matched, and the character length of the boundary character string to be matched is determined to be 2 x (m-1), wherein m represents the character length of the reference character string.

In the embodiment of the present application, as a preferred embodiment, step S240 includes:

In the step, the character strings to be matched are one-dimensional arrays, the determined initial characters are used as starting points, the total character length of the character strings to be matched is used as an end point, the character length of the reference character string is used as a matching step length, the target character string which is the same as the reference character string is searched from each section of the character strings to be matched, and the same is true for the reference character string from the boundary character string to be matched.

The embodiment of the application provides a character string matching method, when character string matching is carried out, a text to be matched is firstly divided into a plurality of sections of character strings to be matched, wherein the character length of the character strings to be matched is larger than or equal to the character length of a reference character string, meanwhile, in order to avoid missing boundary characters of each section of character string to be matched, at least one boundary character is respectively extracted from one side, adjacent to each other, of any two adjacent sections of character strings to be matched, so that a plurality of sections of boundary character strings to be matched are obtained, each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the two adjacent sections of character strings to be matched, the character length of each section of boundary character string to be matched is larger than or equal to the character length of the reference character string, when character string matching is carried out, a target character string matched with the reference character string is determined from the plurality of sections of character strings to be, and finally, counting the number of the target character strings matched with the reference character strings. Therefore, when the character strings are matched, the integrity of all the matched characters in the matching process can be ensured, the character string matching efficiency is effectively improved, the consumption of the character string matching time is greatly reduced, and the character string matching performance is improved.

Based on the same inventive concept, a character string matching system corresponding to the character string matching method is provided in the embodiments of the present application, and because the principle of solving the problem of the character string matching system in the embodiments of the present application is similar to that of the character string matching method in the embodiments of the present application, the implementation of the system can refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 3 and 4, fig. 3 is a first schematic structural diagram of a string matching system according to an embodiment of the present application, and fig. 4 is a second schematic structural diagram of a string matching system according to an embodiment of the present application. As shown in fig. 3, the character string matching system 300 includes:

an obtaining module 310, configured to obtain a text to be matched and a reference character string for the text to be matched;

a first determining module 320, configured to determine multiple segments of character strings to be matched from the text to be matched, where a character length of the character string to be matched is greater than or equal to a character length of the reference character string;

the second determining module 330 is configured to extract at least one boundary character from each of mutually adjacent sides of any two adjacent segments of to-be-matched character strings, and determine a plurality of segments of to-be-matched boundary character strings, where each segment of to-be-matched boundary character string includes a plurality of boundary characters extracted from two adjacent segments of to-be-matched character strings, and a character length of each segment of to-be-matched boundary character string is greater than or equal to a character length of the reference character string;

and the third determining module 340 is configured to determine a target character string matched with the reference character string from the multiple segments of the character strings to be matched and the multiple segments of the boundary character strings to be matched.

Further, as shown in fig. 4, after the third determining module 340 is configured to determine a target character string matching the reference character string from the multiple segments of character strings to be matched and the multiple segments of boundary character strings to be matched, the character string matching system 300 further includes:

and a counting module 350, configured to count the number of the target character strings matched with the reference character string.

In this embodiment of the application, the first determining module 320 is configured to determine a plurality of segments of character strings to be matched by:

acquiring the character length of the reference character string;

In this embodiment, as a preferred embodiment, the second determining module 330 is configured to determine the boundary character string to be matched through the following steps:

In this embodiment of the application, the second determining module 330 is further configured to determine the character length of the boundary character string to be matched by:

In this embodiment, as a preferred embodiment, the second determining module 330 is configured to determine the character length of the boundary character string to be matched by the following formula:

M＝2×(m-1)；

In this embodiment of the application, when the third determining module 340 is configured to determine, from the multiple segments of the character strings to be matched and the multiple segments of the boundary character strings to be matched, a target character string that matches the reference character string, the third determining module 340 is specifically configured to:

The embodiment of the application provides a character string matching system, which comprises an acquisition module, a first determination module, a second determination module and a third determination module; the acquisition module is used for acquiring a text to be matched and a reference character string aiming at the text to be matched; the first determining module is used for determining a plurality of sections of character strings to be matched from the text to be matched, wherein the character length of the character strings to be matched is greater than or equal to the character length of the reference character string; the second determining module is used for respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched and determining a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the two adjacent sections of character strings to be matched, and the character length of each section of boundary character string to be matched is greater than or equal to that of the reference character string; and the third determining module is used for determining a target character string matched with the reference character string from the multiple segments of character strings to be matched and the multiple segments of boundary character strings to be matched.

Therefore, when the character strings are matched, the integrity of all the matched characters in the matching process can be ensured, the character string matching efficiency is effectively improved, the consumption of the character string matching time is greatly reduced, and the character string matching performance is improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic device 500 includes a processor 510, a memory 520, and a bus 530.

The memory 520 stores machine-readable instructions executable by the processor 510, when the electronic device 500 runs, the processor 510 communicates with the memory 520 through the bus 530, and when the machine-readable instructions are executed by the processor 510, the steps of the character string matching method shown in fig. 1 or fig. 2 may be performed.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the character string matching method described in fig. 1 or fig. 2 may be executed.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A character string matching method, characterized in that the character string matching method comprises:

determining a plurality of segments of character strings to be matched by the following steps: acquiring the character length of the reference character string; determining the division step length of the text to be matched based on the character length of the reference character string; based on the division step length, with the first character of the text to be matched as a starting point, dividing character strings of the text to be matched, and determining a plurality of sections of character strings to be matched;

respectively extracting at least one boundary character from the mutually adjacent sides of any two adjacent sections of character strings to be matched, and determining a plurality of sections of boundary character strings to be matched; each segment of boundary character string to be matched comprises a plurality of boundary characters extracted from two adjacent segments of character strings to be matched, and the character length of the boundary character string to be matched is the sum of the character lengths of the boundary characters extracted from the adjacent sides of the two adjacent segments of character strings to be matched;

determining the character length of the boundary character string to be matched by the following formula:

M＝2×(m-1)；

wherein, M represents the character length of the boundary character string to be matched, M represents the character length of the reference character string, and M-1 represents the character length of the boundary character;

2. The character string matching method according to claim 1, wherein after determining a target character string that matches the reference character string from among the plurality of pieces of character strings to be matched and the plurality of pieces of boundary character strings to be matched, the character string matching method further comprises:

3. The character string matching method according to claim 1, wherein the boundary character string to be matched is determined by:

4. The character string matching method according to claim 1, wherein the determining of the target character string matching the reference character string from the plurality of segments of the character string to be matched and the plurality of segments of the boundary character string to be matched comprises:

5. A string matching system, characterized in that the string matching system comprises:

the first determining module is used for determining a plurality of segments of character strings to be matched through the following steps: acquiring the character length of the reference character string; determining the division step length of the text to be matched based on the character length of the reference character string; based on the division step length, with the first character of the text to be matched as a starting point, dividing character strings of the text to be matched, and determining a plurality of sections of character strings to be matched;

the second determining module is used for respectively extracting at least one boundary character from the mutually adjacent side of any two adjacent sections of character strings to be matched and determining a plurality of sections of boundary character strings to be matched, wherein each section of boundary character string to be matched comprises a plurality of boundary characters extracted from the two adjacent sections of character strings to be matched, and the character length of the boundary character string to be matched is the sum of the character lengths of the boundary characters extracted from the mutually adjacent sides of the two adjacent sections of character strings to be matched;

the second determining module is used for determining the character length of the boundary character string to be matched through the following formula:

M＝2×(m-1)；

6. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when an electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the string matching method of any of claims 1 to 4.

7. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the string matching method according to any one of claims 1 to 4.