CN110287147B

CN110287147B - Character string sorting method and device

Info

Publication number: CN110287147B
Application number: CN201910567581.XA
Authority: CN
Inventors: 林荷滨; 李鑫辉; 黄凯
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2022-08-19
Anticipated expiration: 2039-06-27
Also published as: CN110287147A

Abstract

The application discloses a character string ordering method, which comprises the following steps: obtaining a first uniform code and a second uniform code; sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from high order to low order until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code; when the characters corresponding to the first code segment and the characters corresponding to the second code segment are different character types, determining the size relationship between the characters corresponding to the first code segment and the characters corresponding to the second code segment according to the size relationship of a plurality of preset character types; determining that the character string corresponding to the larger one of the character corresponding to the first code segment and the character corresponding to the second code segment is larger than the other character string; and sequencing the two character strings according to the size relation of the first character string and the second character string. Based on the method disclosed by the application, under the condition that a plurality of character strings to be sequenced contain a plurality of types of characters, the plurality of character strings can be sequenced quickly.

Description

Character string sorting method and device

Technical Field

The application belongs to the technical field of computers, and particularly relates to a character string sorting method and device.

Background

With the continuous development of storage technology, a large number of files can be stored in both a terminal and a server, and a server of a video website stores a large number of video files. To facilitate management and searching of files, the files are typically ordered. The sorting of the plurality of files is essentially the sorting of the names of the plurality of files, i.e. the sorting of the plurality of strings.

Since the name of a file is often random, the name of the file contains various types of characters. How to sort a plurality of character strings composed of multiple types of characters is a technical problem faced by those skilled in the art.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method and an apparatus for sorting character strings, so as to achieve the purpose of sorting a plurality of character strings composed of multiple types of characters.

In order to achieve the above purpose, the present application provides the following technical solutions:

the application provides a character string ordering method, which comprises the following steps:

obtaining a first uniform code and a second uniform code; the first uniform code is a uniform code corresponding to a first character string to be sequenced, and the second uniform code is a uniform code corresponding to a second character string to be sequenced;

sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from high order to low order until the first code segment extracted in the first uniform code is different from the second code segment extracted in the second uniform code; wherein the code segment extracted in the first uniform code corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second character string;

determining the size relationship between the characters corresponding to the first code segment and the characters corresponding to the second code segment according to the preset size relationship of a plurality of character types under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are different in character type;

determining that a character string corresponding to the larger of the character corresponding to the first code segment and the character corresponding to the second code segment is larger than the other character string;

and sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

Optionally, on the basis of the above method, the method further includes:

under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both in a number type, extracting a third code segment from the first uniform code and extracting a fourth code segment from the second uniform code; the third code segment is a code segment corresponding to the first numeric string, and the fourth code segment is a code segment corresponding to the second numeric string; the first numeric string is a numeric string to which the character corresponding to the first code segment belongs in the first character string, and the second numeric string is a numeric string to which the character corresponding to the second code segment belongs in the second character string;

converting the third code segment into first data of a floating point type, and converting the fourth code segment into second data of the floating point type;

determining that a character string corresponding to the larger of the first data and the second data is larger than the other character string.

Optionally, on the basis of the above method, the method further includes:

under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both Chinese character types, obtaining a first pinyin and a second pinyin, wherein the first pinyin is the pinyin of the characters corresponding to the first code segment, and the second pinyin is the pinyin of the characters corresponding to the second code segment;

sequentially extracting letters from the first pinyin and the second pinyin respectively according to the sequence from left to right;

if two letters in the same order extracted from the first pinyin and the second pinyin are different, comparing the sizes of the two letters, and determining that the character string corresponding to the larger one of the two letters is larger than the other character string;

if the first pinyin and the second pinyin contain the same number of letters and the letters in the same order in the first pinyin and the second pinyin are the same, comparing the sizes of a first tone in the first pinyin and a second tone in the second pinyin, and determining that the character string corresponding to the larger one of the first tone and the second tone is larger than the other character string.

Optionally, on the basis of the above method, the method further includes:

and if the first pinyin is the same as the second pinyin, obtaining a first ASCII code of the character corresponding to the first code segment and a second ASCII code of the character corresponding to the second code segment, comparing the sizes of the first ASCII code and the second ASCII code, and determining that the character string corresponding to the larger one of the first ASCII code and the second ASCII code is larger than the other character string.

Optionally, on the basis of the above method, the method further includes:

and under the condition that any one of the first uniform code and the second uniform code finishes the extraction of all code segments but different code segments are not extracted, determining that the character string corresponding to the longer one of the first uniform code and the second uniform code is larger than the other character string.

The present application further provides a character string sorting device, including:

the data acquisition unit is used for acquiring a first uniform code and a second uniform code; the first uniform code is a uniform code corresponding to a first character string to be sequenced, and the second uniform code is a uniform code corresponding to a second character string to be sequenced;

a code segment extracting unit, configured to sequentially extract code segments in the first uniform code and the second uniform code, respectively, in an order from a high order to a low order, until a first code segment extracted in the first uniform code is different from a second code segment extracted in the second uniform code; wherein the code segment extracted at the first uniform code corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second character string;

the character type comparison unit is used for determining the size relationship between the characters corresponding to the first code segment and the characters corresponding to the second code segment according to the preset size relationship of a plurality of character types under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are different in character type; determining that a character string corresponding to the larger of the character corresponding to the first code segment and the character corresponding to the second code segment is larger than the other character string;

and the sequencing unit is used for sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

Optionally, on the basis of the above apparatus, the apparatus further includes:

a numeric string comparison unit, configured to extract a third code segment from the first uniform code and a fourth code segment from the second uniform code when the characters corresponding to the first code segment and the characters corresponding to the second code segment are both of numeric types; the third code segment is a code segment corresponding to the first numeric string, and the fourth code segment is a code segment corresponding to the second numeric string; the first numeric string is a numeric string to which the character corresponding to the first code segment belongs in the first character string, and the second numeric string is a numeric string to which the character corresponding to the second code segment belongs in the second character string; converting the third code segment into first data of a floating point type, and converting the fourth code segment into second data of the floating point type; determining that a character string corresponding to the larger of the first data and the second data is larger than the other character string.

a pinyin comparison unit, configured to obtain a first pinyin and a second pinyin when the character corresponding to the first code segment and the character corresponding to the second code segment are both of chinese character types, where the first pinyin is a pinyin of the character corresponding to the first code segment, and the second pinyin is a pinyin of the character corresponding to the second code segment; sequentially extracting letters from the first pinyin and the second pinyin respectively according to the sequence from left to right; if two letters in the same sequence extracted from the first pinyin and the second pinyin are different, comparing the sizes of the two letters, and determining that the character string corresponding to the larger letter is larger than the other character string; if the first pinyin and the second pinyin contain letters with the same number and the letters in the same sequence in the first pinyin and the second pinyin are the same, comparing the sizes of a first tone in the first pinyin and a second tone in the second pinyin, and determining that the character string corresponding to the larger one of the first tone and the second tone is larger than the other character string.

and the standard code comparison unit is used for obtaining a first ASCII code of a character corresponding to the first code segment and a second ASCII code of a character corresponding to the second code segment under the condition that the pinyin comparison unit determines that the first pinyin is the same as the second pinyin, comparing the sizes of the first ASCII code and the second ASCII code, and determining that a character string corresponding to the larger one of the first ASCII code and the second ASCII code is larger than the other character string.

and the length comparison unit is used for determining that the character string corresponding to the longer one of the first uniform code and the second uniform code is larger than the other character string when all code segments are extracted from any one of the first uniform code and the second uniform code but different code segments are not extracted.

Therefore, the character string ordering method disclosed by the application can be seen in that for a first character string and a second character string to be ordered, a first uniform code corresponding to the first character string and a second uniform code corresponding to the second character string are obtained, then code segments are sequentially extracted from the first uniform code and the second uniform code respectively according to the sequence from high order to low order, if the two extracted code segments are the same, the operation of extracting the code segments is continued until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code, if the character types of the characters corresponding to the first code segment and the second code segment are different, the sizes of the characters corresponding to the first code segment and the characters corresponding to the second code segment are determined according to the size relationship of a plurality of preset character types, and then the size relationship of the first character string and the second character string is determined, and then sorting the two character strings according to the size relation of the first character string and the second character string. According to the character string sorting method, the character strings to be sorted are converted into the Unicode, the Unicode is compared to determine the size of the character strings, and the size relation of the character strings can be rapidly determined under the condition that the character strings to be sorted contain various types of characters, so that the character strings are sorted, and the efficiency of character string sorting is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart diagram of a method for sorting strings disclosed herein;

FIG. 2 is a schematic flow chart diagram of another string sorting method disclosed in the present application;

FIG. 3 is a schematic flow chart diagram of another string sorting method disclosed in the present application;

FIG. 4 is a schematic flow chart diagram of another string sorting method disclosed in the present application;

FIG. 5 is a schematic diagram of a character string sorting apparatus according to the present disclosure;

fig. 6 is a schematic structural diagram of another character string sorting apparatus disclosed in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flow chart of a character string sorting method disclosed in the present application. The character string ordering method comprises the following steps:

s101: a first unicode and a second unicode are obtained.

The first uniform code is a uniform code corresponding to a first character string to be sequenced, and the second uniform code is a uniform code corresponding to a second character string to be sequenced.

In implementation, two character strings to be sorted are obtained, and the two character strings are respectively called a first character string and a second character string. And respectively determining Unicodes corresponding to the first character string and the second character string, wherein the Unicode corresponding to the first character string is called a first Unicode, and the Unicode corresponding to the second character string is called a second Unicode. The Unicode is also called a ten thousand national code or a Unicode, and the english name is Unicode. Unicode has multiple coding schemes such as UTF-8, UTF-16, and UTF-32.

UTF-8 is a variable length coding scheme that uses 1 to 4 bytes to represent a character, and the coding rules are as follows according to the different transform lengths of the character: for a single byte character, the first bit is set as 0, and the following 7 bits correspond to the Unicode code point of the character; for a character (N >1) that needs to be represented by N bytes, the first N bits of the first byte are all set to 1, the (N + 1) th bit is set to 0, the first two bits of the remaining N-1 bytes are all set to 10, and the remaining binary bits are filled with the Unicode code point of the character.

UTF-16 is also a variable length code, using 2 or 4 bytes to represent a character. UTF-32 is in units of 32-bit unsigned integers. The UTF-32 encoding of Unicode is its corresponding 32-bit unsigned integer.

UTF-8 can save storage space compared to UTF-16 and UTF-32. Preferably, the first unicode and the second unicode are unicodes in UTF-8 format.

S102: and sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from the high order to the low order, until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code.

In the present application, bytes (the number of one or more) corresponding to one character in the first unicode and the second unicode are referred to as one code segment. The code segment extracted from the first uniform code corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second string.

For example, the first and second unicodes each include 5 characters, and in order from upper to lower, the first code segment in the first unicode corresponds to the first character of the first character string, the second code segment in the first unicode corresponds to the second character of the first character string, and so on, and the fifth code segment in the first unicode corresponds to the fifth character of the first character string. The first code segment in the second unicode corresponds to the first character of the second string, the second code segment in the second unicode corresponds to the second character of the second string, and so on, and the fifth code segment in the second unicode corresponds to the fifth character of the second string.

It should be noted that, of the first uniform code and the second uniform code, the leftmost bit is the most significant bit, and the rightmost bit is the least significant bit. According to the order from high order to low order, draw the code segment in proper order in first unified code and second unified code respectively, promptly: and sequentially extracting code segments from the first uniform code and the second uniform code according to the sequence from left to right.

In implementation, according to the coding schemes adopted by the first uniform code and the second uniform code, how to sequentially extract the code segments in the first uniform code and the second uniform code can be determined.

Taking the first unicode and the second unicode as unicodes of UTF-8 format as an example:

if a character is represented by one byte, the first bit of the code segment corresponding to the character is "0"; if a character is represented by two bytes, the code segment corresponding to the character includes two bytes, the first three bits of the first byte are "110", and the first two bits of the second byte are "10"; if a character is represented by three bytes, the code segment corresponding to the character includes three bytes, the first four bits of the first byte are "1110", and the first two bits of the other bytes are "10"; if a character is represented by four bytes, the code segment corresponding to the character includes four bytes, the first five bits of the first byte being "1110", and the first two bits of the other bytes being "10".

If the first uniform code is "1110011010110001100010010100-. cndot. cndot.," 111001101011000110001001 "corresponds to a character according to the encoding rule of UTF-8, the first code segment extracted from the first uniform code is" 111001101011000110001001 ", and the first code segment corresponds to the character" han ".

In practice, a first code segment is extracted starting with the most significant bit of the first uniform code, the first code segment corresponding to the first character in the first character string, and a first code segment is extracted starting with the most significant bit of the second uniform code, the first code segment corresponding to the first character in the second character string. And judging whether the two extracted code segments are the same, if so, extracting a second code segment from the first uniform code, wherein the code segment corresponds to a second character in the first character string, and extracting a second code segment from the second uniform code, wherein the code segment corresponds to a second character in the second character string. And judging whether the two extracted code segments are the same, if so, continuing to extract the code segments from the first uniform code and the second uniform code according to the sequence from the high order to the low order until the code segment extracted from the first uniform code is different from the code segment extracted from the second uniform code.

S103: and under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are different in character type, determining the size relationship between the characters corresponding to the first code segment and the characters corresponding to the second code segment according to the preset size relationship of a plurality of character types.

When a first code segment extracted from the first uniform code is different from a second code segment extracted from the second uniform code, determining the character types of characters corresponding to the first code segment and the second code segment. And if the character types of the characters corresponding to the first code segment and the second code segment are different, determining the size relationship of the characters corresponding to the first code segment and the characters corresponding to the second code segment according to the preset size relationship of a plurality of character types.

For example, the types of characters used in the Chinese context typically include special characters, letters, numbers, and chinese characters. As an embodiment, the size relationship of a plurality of character types is predefined as: special characters < number < letter < Chinese character. If the character type of the character corresponding to the first code segment is Chinese character and the character type of the character corresponding to the second code segment is number, the size relationship of the plurality of character types defined above can be determined, and the character corresponding to the first code segment is larger than the character corresponding to the second code segment.

S104: and determining that the character string corresponding to the larger one of the character corresponding to the first code segment and the character corresponding to the second code segment is larger than the other character string.

That is, if the character corresponding to the first code segment is larger than the character corresponding to the second code segment, the first character string is determined to be larger than the second character string; if the character corresponding to the first code segment is smaller than the character corresponding to the second code segment, the first character string is determined to be smaller than the second character string.

To facilitate a better understanding of the technical solution, the following is exemplified:

according to the sequence from high order to low order, code segments are sequentially extracted from a first unified code and a second unified code respectively, the two code segments extracted for the first time are the same, the two code segments extracted for the second time are the same, the two code segments extracted for the third time are different, the character corresponding to the first code segment extracted from the first unified code is assumed to be 'first', the character corresponding to the second code segment extracted from the second unified code is assumed to be '3', and therefore the character type of the character corresponding to the first code segment is determined to be a Chinese character type, and the character type of the character corresponding to the second code segment is determined to be a digital type. According to the size relation of a plurality of preset character types, the number type is smaller than the Chinese type, therefore, the character corresponding to the first code segment is larger than the character corresponding to the second code segment, and the first character string is determined to be larger than the second character string.

S105: and sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

The method for sorting character strings includes obtaining a first uniform code corresponding to a first character string and a second character string to be sorted, obtaining a second uniform code corresponding to a second character string, sequentially extracting code segments from the first uniform code and the second uniform code respectively according to a sequence from a high order to a low order, continuing to extract the code segments if the two extracted code segments are the same until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code, determining sizes of characters corresponding to the first code segment and the second code segment according to preset size relations of a plurality of character types if the character types of the characters corresponding to the first code segment and the second code segment are different, and further determining the size relation of the first character string and the second character string according to the size relation of the first character string and the second character string, the two strings are sorted. According to the character string sorting method disclosed by the application, the character strings to be sorted are converted into the Unicode, the sizes of the character strings are determined by comparing the Unicode, and the size relation of the character strings can be rapidly determined under the condition that the character strings to be sorted contain various types of characters, so that the character strings are sorted, and the efficiency of character string sorting is improved.

Fig. 2 is a schematic flow chart of another method for sorting character strings according to the embodiment of the present application. The character string ordering method comprises the following steps:

s201: a first unicode and a second unicode are obtained.

S202: and sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from high order to low order until the first code segment extracted in the first uniform code is different from the second code segment extracted in the second uniform code.

Wherein, the code segment extracted from the first uniform code corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second string.

The specific implementation process of step S201 and step S202 is the same as that of step S101 and step S102 shown in fig. 1, and is not described herein again.

S203: and under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both of a numeric type, extracting a third code segment from the first unified code and extracting a fourth code segment from the second unified code.

The third code segment is a code segment corresponding to the first numeric string, and the fourth code segment is a code segment corresponding to the second numeric string; the first numeric string is a numeric string to which the character corresponding to the first code segment belongs in the first character string, and the second numeric string is a numeric string to which the character corresponding to the second code segment belongs in the second character string. It should be noted that the first string of digits and the second string of digits may include only one digit, and the first string of digits and the second string of digits may include a decimal point.

When a first code segment extracted from the first uniform code is different from a second code segment extracted from the second uniform code, determining the character type of the character corresponding to the first code segment, and determining the character type of the character corresponding to the second code segment. If the characters corresponding to the first code segment and the characters corresponding to the second code segment are both of the number type, extracting a third code segment from the first unicode and extracting a fourth code segment from the second unicode.

The third code segment includes the first code segment and may also include a code segment which is continuous with the first code segment and has a character type of a number type or a decimal point. The fourth code segment includes the second code segment and may further include a code segment which is continuous with the second code segment and has a character type of a number type or a decimal point.

S204: and converting the third code segment into floating point type first data and converting the fourth code segment into floating point type second data.

S205: and determining that the character string corresponding to the larger one of the first data and the second data is larger than the other character string.

That is, if the first data is larger than the second data, the first character string is larger than the second character string; if the first data is less than the second data, then the first string is determined to be less than the second string.

It should be noted that, the first numeric string and the second numeric string may contain a decimal point, and the third code segment and the fourth code segment are converted into floating-point data, and then compared, so that the correctness of the comparison result can be ensured.

the two strings to be sorted are "wife password set 90" and "wife password set 100". First, a first unicode corresponding to a first string "wife password set 90" is obtained, and a second unicode corresponding to a second string "wife password set 100" is obtained.

And sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from high order to low order until a first code segment corresponding to the character '9' is extracted from the first uniform code and a second code segment corresponding to the character '1' is extracted from the second uniform code. Since the character "9" corresponding to the first code segment and the character "1" corresponding to the second code segment are both of numeric type, a third code segment is extracted from the first uniform code, and a fourth code segment is extracted from the second uniform code, the third code segment being a code segment corresponding to the numeric string "90", and the fourth code segment being a code segment corresponding to the numeric string "100". The third code segment is converted into first data 90 of the floating point type (decimal) and the fourth code segment is converted into second data 100 of the floating point type (decimal). Since the second data is larger than the first data, the character string corresponding to the second data is determined to be larger than the character string corresponding to the first data, that is, the second character string is determined to be larger than the first character string.

It can be seen that when the sizes of the first character string and the second character string need to be determined according to the numbers in the first character string and the second character string, the numerical comparison is performed on the character strings instead of the text comparison, and the obtained comparison result is more accurate. The description is made in conjunction with the above examples: if the digits in the first string and the second string are compared textually, the first digit "1" in "100" is less than the first digit "9" in "90", then a comparison result is obtained that the first string is larger than the second string, and the comparison result is erroneous.

S206: and sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

The character string ordering method disclosed in fig. 2 of the present application, for a first character string and a second character string to be ordered, first obtain a first unicode corresponding to the first character string and a second unicode corresponding to the second character string, then sequentially extract code segments in the first unicode and the second unicode respectively according to the sequence from high order to low order until the first code segment extracted from the first unicode is different from the second code segment extracted from the second unicode, if the characters corresponding to the first code segment and the characters corresponding to the second code segment are of numeric types, extracting a third code segment corresponding to the first numeric string from the first uniform code, and extracting a third code segment corresponding to the second numeric string from the second Unicode, converting the third code segment into floating point type first data, converting the fourth code segment into floating point type second data, and determining the sizes of the first character string and the second character string by comparing the two floating point type data. According to the character string sorting method disclosed by the application, when the size relation between the first character string and the second character string is required to be determined according to the numbers, the size relation between the first character string and the second character string can be accurately determined by comparing the numerical values of the number strings in the first character string and the second character string.

Fig. 3 is a schematic flow chart of another method for sorting character strings according to the embodiment of the present disclosure. The character string ordering method comprises the following steps:

s301: a first unicode and a second unicode are obtained.

The first Unicode is a Unicode corresponding to a first character string to be sorted, and the second Unicode is a Unicode corresponding to a second character string to be sorted.

S302: and sequentially extracting code segments in the first uniform code and the second uniform code respectively according to the sequence from the high order to the low order, until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code.

Wherein the code segment extracted at the first unicode corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second character string.

The specific implementation process of step S301 and step S302 is the same as that of step S101 and step S102 shown in fig. 1, and is not described herein again.

S303, under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both Chinese character types, obtaining a first pinyin and a second pinyin.

The first pinyin is the pinyin of the character corresponding to the first code segment, and the second pinyin is the pinyin of the character corresponding to the second code segment.

As an implementation mode, under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both Chinese character types, the first pinyin corresponding to the first code segment and the second pinyin corresponding to the second code segment are obtained by introducing a Chinese pinyin packet.

S304, sequentially extracting letters from the first pinyin and the second pinyin according to the sequence from left to right.

S305, if two letters extracted from the first pinyin and the second pinyin in the same order are different, comparing the sizes of the two letters.

S306, determining that the character string corresponding to the larger of the two letters is larger than the other character string, and executing S309.

S307: if the first pinyin and the second pinyin contain the same number of letters and the letters in the same sequence in the first pinyin and the second pinyin are the same, comparing the sizes of the first tone in the first pinyin and the second tone in the second pinyin.

S308: it is determined that the character string corresponding to the larger of the first tone and the second tone is larger than the other character string, S309 is performed.

It should be noted that, if the first pinyin and the second pinyin contain letters of different numbers, the extraction of all the letters is completed in any one of the first pinyin and the second pinyin, but the letters in the same order but different orders are not extracted, and it is determined that the character string corresponding to the letter of the first pinyin and the second pinyin with the larger number is larger than the other character string. For example: the first pinyin is 'xian', the second pinyin is 'xiang', letters from 1 st to 4 th in the first pinyin are the same as letters from 1 st to 4 th in the second pinyin, and after all the letters in the first pinyin are extracted, letters in the same order but different letters are not extracted, so that the second character string corresponding to the second pinyin is determined to be larger than the first character string corresponding to the first pinyin.

S309: and sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

assuming that the characters corresponding to the first code segment are 'text' and the characters corresponding to the second code segment are 'yes', the characters corresponding to the first code segment and the second code segment are both Chinese character types, and a Chinese phonetic package is introduced to obtain a first pinyin wen2 and a second pinyin wei 2. It should be noted that pinyin includes tones, and the tones are represented by numbers. Optionally, the first sound, the second sound, the third sound, the fourth sound and the soft sound are sequentially represented by numbers with gradually increasing numerical values, or sequentially represented by numbers with gradually decreasing numerical values. In addition, tones can be expressed by integers or decimal numbers. For example, the first, second, third, fourth, and fourth sounds are represented by 1, 2, 3, 4, and 5 in decimal order. The "2" in the first pinyin "wen 2" and the second pinyin "wei 2" is a tone.

Sequentially extracting letters from the first pinyin and the second pinyin from left to right until the first letter 'n' extracted from the first pinyin is different from the second letter 'i' extracted from the second pinyin. Comparing the sizes of the first letter "n" and the second letter "i" results in that the first letter "n" is larger than the second letter "i", thus determining that the first character string is larger than the second character string.

Assuming that the characters corresponding to the first code segment are 'sets', the characters corresponding to the second code segment are 'seasons', the characters corresponding to the first code segment and the second code segment are both Chinese character types, and a first pinyin 'ji 2' and a second pinyin 'ji 4' are obtained by introducing a Chinese pinyin packet. The method comprises the steps of sequentially extracting letters from a first pinyin and a second pinyin according to the sequence from left to right, and comparing the sizes of a first tone in the first pinyin and a second tone in the second pinyin to determine the sizes of a first character string and a second character string because letters in the same sequence in the first pinyin and the second pinyin are the same. Taking the numerical expression of the first sound, the second sound, the third sound, the fourth sound and the soft sound with the numerical values gradually increasing in sequence as an example, the result that the second tone in the second pinyin is larger than the first tone in the first pinyin is obtained, and therefore, the second character string is determined to be larger than the first character string.

It should be noted that, if the tones in the pinyin are expressed by decimal numbers, when comparing the sizes of the two tones, the two decimal numbers expressing the tones may be converted into floating-point data, and then the sizes of the two floating-point data may be compared.

The character string ordering method disclosed in fig. 3 of the present application, for a first character string and a second character string to be ordered, first obtaining a first uniform code corresponding to the first character string and a second uniform code corresponding to the second character string, then sequentially extracting code segments in the first uniform code and the second uniform code respectively according to a sequence from a high order to a low order until the first code segment extracted from the first uniform code is different from the second code segment extracted from the second uniform code, if the characters corresponding to the first code segment and the characters corresponding to the second code segment are of a Chinese character type, obtaining a first pinyin and a second pinyin, comparing letters in the same order in the first pinyin and the second pinyin one by one according to a sequence from left to right, if two letters in the same order in the first pinyin and the second pinyin are different, determining a size relationship between the first character string and the second character string by comparing sizes of the two letters, if the letters in the same sequence in the first pinyin and the second pinyin are the same, determining the size relationship between the first character string and the second character string by comparing the tone sizes in the first pinyin and the second pinyin, and sequencing the character strings.

Since a large number of homophones exist in chinese, different characters that appear first in the order from the upper order to the lower order in the first character string and the second character string to be sorted may be homophones, and in this case, the sizes of the first character string and the second character string may be determined based on the ascii codes of the two homophones.

Specifically, on the basis of the character string sorting method shown in fig. 3, the following steps may be further provided:

under the condition that the first pinyin and the second pinyin are the same, a first ASCII code of the character corresponding to the first code segment and a second ASCII code of the character corresponding to the second code segment are obtained, the sizes of the first ASCII code and the second ASCII code are compared, and the character string corresponding to the larger one of the first ASCII code and the second ASCII code is determined to be larger than the other character string.

According to the character string sorting method disclosed by the application, under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are homophones, ASCII codes of the two characters are obtained, and the sizes of the two character strings are determined by comparing the sizes of the two ASCII codes, so that the two character strings are sorted.

In implementation, when a first code segment extracted from a first uniform code is different from a second code segment extracted from a second uniform code, if both characters corresponding to the first code segment and characters corresponding to the second code segment are special characters, the first code segment is converted into a third ascii code, the second code segment is converted into a fourth ascii code, and it is determined that a character string corresponding to the larger of the third ascii code and the fourth ascii code is larger than the other character string.

In implementation, when a first code segment extracted from the first uniform code is different from a second code segment extracted from the second uniform code, if both the characters corresponding to the first code segment and the characters corresponding to the second code segment are letters, the sizes of the two letters are compared, and it is determined that the character string corresponding to the larger letter is larger than the other character string.

In implementation, if the extraction of all code segments is completed in any one of the first unicode and the second unicode but a different code segment is not extracted, it is determined that the character string corresponding to the longer one of the first unicode and the second unicode is larger than the other character string.

For example: the first string is "100 th set of wife's password", and the second string is "100 th set of wife's password-highlight". According to the sequence from high order to low order, code segments are sequentially extracted from a first uniform code corresponding to a first character string '100 th set of a spouse password' and a second uniform code corresponding to a second character string '100 th set of a spouse password-highlight segment', and as the first character string comprises 9 characters and the first 9 characters of the first character string and the second character string are completely the same, different code segments cannot be extracted after all code segments of the first character string are extracted, and under the condition, the second character string is determined to be larger than the first character string.

Based on the above embodiment, the code segments are sequentially extracted in the first unicode and the second unicode respectively in the order from the upper bits to the lower bits, and if the extraction of all the code segments is completed in any one of the first unicode and the second unicode but different code segments are not extracted, it is determined that the longer one of the first unicode and the second unicode is longer than the other character string.

The method for sorting character strings disclosed in the present application is described below with reference to fig. 4, and please refer to fig. 4, which includes:

s401: obtaining a first uniform code and a second uniform code;

the first uniform code is a uniform code corresponding to the first character string, and the second uniform code is a uniform code corresponding to the second character string.

S402: and extracting the ith code segment from the first uniform code and the second uniform code respectively according to the sequence from high order to low order, wherein the initial value of i is 1.

S403: judging whether the extracted two ith code segments are the same; if the two are the same, S404 is executed, and if the two are different, S406 is executed.

S404: and judging whether the first uniform code and the second uniform code have unextracted code segments, if so, executing S405, otherwise, executing S411.

S405: add 1 to i, execute S402.

S406: determining character types of characters corresponding to the two ith code segments; if the character types of the characters corresponding to the two ith code segments are different, executing S407, and if the character types of the characters corresponding to the two ith code segments are the same, executing one of S408, S409, S410 and S412 according to the specific character type; specifically, if all are numeric types, S408 is performed, if all are chinese character types, S409 is performed, if all are special character types, S410 is performed, and if all are letters, S412 is performed.

S407: according to the preset size relationship of a plurality of character types, the size relationship of the characters corresponding to the two ith code segments is determined, so as to determine the size relationship of the first character string and the second character string, and S413 is executed.

S408: extracting a third code segment from the first uniform code, extracting a fourth code segment from the second code segment, determining the size relationship between the first character string and the second character string according to the third code segment and the fourth code segment, and executing step S413.

S409: obtaining the first pinyin and the second pinyin of the characters corresponding to the two code segments, determining the size relationship between the first character string and the second character string according to the first pinyin and the second pinyin, and executing S413.

S410: the two code segments are converted into ascii codes, and the size relationship between the first character string and the second character string is determined according to the two ascii codes, and S413 is performed.

S411: the size relationship of the first character string and the second character string is determined according to the lengths of the first unicode and the second unicode, and S413 is performed.

S412: the size relationship of the first character string and the second character string is determined according to the two letters, and S413 is performed.

S413: and sequencing the first character string and the second character string according to the size relation of the first character string and the second character string.

The application discloses a character string sorting method and correspondingly a character string sorting device. The description of the character string sorting method and the character string sorting apparatus in the specification may be referred to each other.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a character string sorting apparatus disclosed in the present application. The device comprises a data acquisition unit 10, a code segment extraction unit 20, a character type comparison unit 30 and a sorting unit 40.

Wherein:

a data obtaining unit 10 for obtaining the first unicode and the second unicode. The first uniform code is a uniform code corresponding to a first character string to be sequenced, and the second uniform code is a uniform code corresponding to a second character string to be sequenced.

A code segment extracting unit 20, configured to sequentially extract code segments in the first unicode and the second unicode, respectively, in an order from a higher order to a lower order, until a first code segment extracted in the first unicode is different from a second code segment extracted in the second unicode. Wherein the code segment extracted at the first unicode corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second string.

The character type comparing unit 30 is configured to, when the characters corresponding to the first code segment and the characters corresponding to the second code segment are of different character types, determine a size relationship between the characters corresponding to the first code segment and the characters corresponding to the second code segment according to a size relationship between a plurality of preset character types; and determining that the character string corresponding to the larger one of the character corresponding to the first code segment and the character corresponding to the second code segment is larger than the other character string.

And the sorting unit 40 is used for sorting the first character string and the second character string according to the size relation of the first character string and the second character string.

The utility model discloses a character string sequencing device, the first character string and the second character string that will wait to sort are converted into unicode, according to from high-order to low order, draw the code segment in proper order in two unicodes respectively, it is different until the code segment that draws from two unicodes, if the character that these two code segments correspond is different character type, confirm the size of the character that these two code segments correspond according to character type, and confirm the size of first character string and second character string according to the size of the character that these two code segments correspond, later order first character string and second character string. The utility model discloses a character string sequencing device converts the character string of treating the sequencing into the unicode, confirms the size of character string through comparing unicode, under the condition that a plurality of character strings of treating the sequencing contain multiple type character, also can confirm the big or small relation of a plurality of character strings fast to realize the sequencing to a plurality of character strings.

Optionally, on the basis of the above disclosed character string sorting apparatus, a digital string comparing unit 50 is further provided, as shown in fig. 6.

The digital string comparison unit 50 is configured to: under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both in a digital type, extracting a third code segment from the first uniform code and extracting a fourth code segment from the second uniform code; the third code segment is a code segment corresponding to the first numeric string, and the fourth code segment is a code segment corresponding to the second numeric string; the first numeric string is a numeric string which belongs to the character corresponding to the first code segment in the first character string, and the second numeric string is a numeric string which belongs to the character corresponding to the second code segment in the second character string; converting the third code segment into floating point type first data and converting the fourth code segment into floating point type second data; determining that the larger of the first data and the second data corresponds to a character string that is larger than the other character string.

Optionally, on the basis of the above-disclosed character string sorting apparatus, a pinyin comparison unit 60 is further provided, as shown in fig. 6.

The pinyin comparison unit 60 is configured to: under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both Chinese character types, obtaining a first pinyin and a second pinyin, wherein the first pinyin is the pinyin of the characters corresponding to the first code segment, and the second pinyin is the pinyin of the characters corresponding to the second code segment; sequentially extracting letters from the first pinyin and the second pinyin respectively according to the sequence from left to right, and if the two letters extracted from the first pinyin and the second pinyin in the same sequence are different, comparing the sizes of the two letters, and determining that the character string corresponding to the larger one of the two letters is larger than the other character string; if the first pinyin and the second pinyin contain the same number of letters and the letters in the same sequence in the first pinyin and the second pinyin are the same, comparing the sizes of a first tone in the first pinyin and a second tone in the second pinyin, and determining that the character string corresponding to the larger one of the first tone and the second tone is larger than the other character string.

Optionally, on the basis of the above-disclosed character string sorting apparatus, a standard code comparing unit 70 is further provided, as shown in fig. 6.

The standard code comparing unit 70 is configured to: under the condition that the pinyin comparison unit 60 determines that the first pinyin and the second pinyin are the same, a first ASCII code of the character corresponding to the first code segment and a second ASCII code of the character corresponding to the second code segment are obtained, the sizes of the first ASCII code and the second ASCII code are compared, and the character string corresponding to the larger one of the first ASCII code and the second ASCII code is determined to be larger than the other character string.

Optionally, on the basis of the above-disclosed character string sorting apparatus, a length comparison unit 80 is further provided, as shown in fig. 6.

The length comparison unit 80 is configured to: and under the condition that any one of the first uniform code and the second uniform code finishes the extraction of all code segments but different code segments are not extracted, determining that the character string corresponding to the longer one of the first uniform code and the second uniform code is larger than the other character string.

Optionally, on the basis of the above disclosed character string sorting apparatus, a special character comparison unit is further provided. The special character comparison unit is used for: under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are special characters, the first code segment is converted into a third ASCII code, the second code segment is converted into a fourth ASCII code, and the character string corresponding to the larger one of the third ASCII code and the fourth ASCII code is determined to be larger than the other character string.

Optionally, on the basis of the above disclosed character string sorting apparatus, a letter comparison unit is further provided. The letter comparison unit is configured to: and under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both letters, comparing the sizes of the two letters, and determining that the character string corresponding to the larger letter is larger than the other character string.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of string ordering, comprising:

under the condition that the characters corresponding to the first code segment and the second code segment are both special characters, converting the first code segment into a third ASCII code, converting the second code segment into a fourth ASCII code, and determining that a character string corresponding to the larger one of the third ASCII code and the fourth ASCII code is larger than the other character string; under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are homophones, ASCII codes of the two characters are obtained, and the sizes of the two character strings are determined by comparing the sizes of the two ASCII codes;

2. The method of claim 1, further comprising:

determining that the character string corresponding to the larger of the first data and the second data is larger than the other character string.

3. The method of claim 1 or 2, further comprising:

4. The method of claim 3, further comprising:

5. The method of claim 1 or 2, further comprising:

6. A character string sorting apparatus, comprising:

a code segment extracting unit, configured to sequentially extract code segments in the first uniform code and the second uniform code, respectively, in an order from a high order to a low order, until a first code segment extracted in the first uniform code is different from a second code segment extracted in the second uniform code; wherein the code segment extracted in the first uniform code corresponds to a character in the first character string; the code segment extracted in the second unicode corresponds to a character in the second character string;

a special character comparison unit, configured to, when both the character corresponding to the first code segment and the character corresponding to the second code segment are special characters, convert the first code segment into a third ascii code, convert the second code segment into a fourth ascii code, and determine that a character string corresponding to the larger of the third ascii code and the fourth ascii code is larger than another character string;

the standard code comparison unit is used for obtaining ASCII codes of the two characters under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are homophones, so that the sizes of the two character strings are determined by comparing the sizes of the two ASCII codes;

7. The apparatus of claim 6, further comprising:

a numeric string comparison unit, configured to extract a third code segment from the first uniform code and a fourth code segment from the second uniform code when the characters corresponding to the first code segment and the characters corresponding to the second code segment are both of numeric types; the third code segment is a code segment corresponding to a first numeric string, and the fourth code segment is a code segment corresponding to a second numeric string; the first numeric string is a numeric string to which the character corresponding to the first code segment belongs in the first character string, and the second numeric string is a numeric string to which the character corresponding to the second code segment belongs in the second character string; converting the third code segment into first data of a floating point type, and converting the fourth code segment into second data of the floating point type; determining that the character string corresponding to the larger of the first data and the second data is larger than the other character string.

8. The apparatus of claim 6 or 7, further comprising:

the pinyin comparison unit is used for obtaining a first pinyin and a second pinyin under the condition that the characters corresponding to the first code segment and the characters corresponding to the second code segment are both Chinese character types, wherein the first pinyin is the pinyin of the characters corresponding to the first code segment, and the second pinyin is the pinyin of the characters corresponding to the second code segment; sequentially extracting letters from the first pinyin and the second pinyin respectively according to the sequence from left to right; if two letters in the same sequence extracted from the first pinyin and the second pinyin are different, comparing the sizes of the two letters, and determining that the character string corresponding to the larger letter is larger than the other character string; if the first pinyin and the second pinyin contain letters with the same number and the letters in the same sequence in the first pinyin and the second pinyin are the same, comparing the sizes of a first tone in the first pinyin and a second tone in the second pinyin, and determining that the character string corresponding to the larger one of the first tone and the second tone is larger than the other character string.

9. The apparatus of claim 8, further comprising:

and the standard code comparison unit is used for obtaining a first ASCII code of the character corresponding to the first code segment and a second ASCII code of the character corresponding to the second code segment under the condition that the pinyin comparison unit determines that the first pinyin is the same as the second pinyin, comparing the sizes of the first ASCII code and the second ASCII code, and determining that the character string corresponding to the larger one of the first ASCII code and the second ASCII code is larger than the other character string.

10. The apparatus of claim 6 or 7, further comprising: