CN111339735A - Character string length calculation method and device and computer storage medium - Google Patents

Character string length calculation method and device and computer storage medium Download PDF

Info

Publication number
CN111339735A
CN111339735A CN202010152674.9A CN202010152674A CN111339735A CN 111339735 A CN111339735 A CN 111339735A CN 202010152674 A CN202010152674 A CN 202010152674A CN 111339735 A CN111339735 A CN 111339735A
Authority
CN
China
Prior art keywords
code point
state
code
state machine
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010152674.9A
Other languages
Chinese (zh)
Other versions
CN111339735B (en
Inventor
张宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cubesili Information Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN202010152674.9A priority Critical patent/CN111339735B/en
Publication of CN111339735A publication Critical patent/CN111339735A/en
Application granted granted Critical
Publication of CN111339735B publication Critical patent/CN111339735B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Generation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a character string length calculation method and device and a computer storage medium, and belongs to the technical field of electronics. The method comprises the following steps: and acquiring all code points corresponding to the target character string. And dividing all code points to obtain one or more target code point sets. And determining the number of the target code point sets as the length of the target character string. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.

Description

Character string length calculation method and device and computer storage medium
Technical Field
The present application relates to the field of electronic technologies, and in particular, to a method and an apparatus for calculating a string length, and a computer storage medium.
Background
Emoji is a visual emotion symbol used in text, encoded using a uniform code (Unicode), and rendered as icons in computer devices. In instant messaging software, emoji's small yellow face emoticon is widely used to enrich the content of chat. The instant messaging software can transmit and display the character strings containing emoji, and the effect of mixed typesetting of texts and icons is achieved.
Calculating the length of the character string is an operation which is widely used and has high use frequency. For example: part of the software may specify that the user's custom nickname cannot exceed 10 characters in length, that the password cannot be shorter than 6 characters in length, and that the utterance cannot exceed 140 characters in length, etc. These application scenarios all involve the calculation of the length of the character string. In the related art, when calculating the length of a character string including emoji, the number of Unicode code points constituting the emoji is generally set as the character length corresponding to the emoji.
However, since an emoji may be composed of a plurality of Unicode code points, the character length of the emoji calculated by the calculation method in the related art is greater than 1. And the emoji is rendered into an icon in the computer equipment, and the computer equipment can judge that the character length of the icon is greater than 1, so that the actual character length of the display content of the computer equipment is not consistent with the calculated character length, and the display accuracy is low.
Disclosure of Invention
The application provides a character string length calculation method and device and a computer storage medium, which can solve the problem of low display accuracy in the related art. The technical scheme is as follows:
in a first aspect, a method for calculating a length of a character string is provided, the method comprising:
acquiring all code points corresponding to the target character string;
dividing all the code points to obtain one or more target code point sets, wherein each target code point set comprises one or more code points, each target code point set corresponds to one character, the character is non-sketching character or sketching character, the target code point set corresponding to each non-sketching character comprises one code point, and the target code point set corresponding to each sketching character comprises one or more code points;
and determining the number of the target code point sets as the length of the target character string.
Optionally, the all code points include one or more of a number code point, an expression code point, a country region code point, a modifier code point, and a connector code point.
Optionally, the number of all code points is n, where n is a positive integer, and the dividing all code points to obtain one or more target code point sets includes:
acquiring an initial code point set, wherein the initial code point set comprises a jth code point in all the code points, and the initial value of j is 1;
executing a code point set division process on the initial code point set, wherein the code point set division process comprises the following steps:
reading the state of a state machine corresponding to the (j + 1) th code point in all the code points;
determining whether the j +1 th code point belongs to the initial code point set according to the state of the state machine corresponding to the j th code point and the state of the state machine corresponding to the j +1 th code point;
adding the (j + 1) th code point into the initial code point set to obtain an updated initial code point set when the (j + 1) th code point belongs to the initial code point set,
if j +1< n and the j +1 th code point is not a country region code point, making j equal to j +1, and executing the code point set division process again on the updated initial code point set,
if j +1< n and the j +1 th code point is a country region code point, using the updated initial code point set as a target code point set, and generating a new code point set, where the new code point set includes the j +2 th code point, and j is j +2, executing the code point set division process on the new initial code point set,
if j +1 is equal to n, taking the updated initial code point set as a target code point set;
when the j +1 th code point does not belong to the initial code point set, the initial code point set is taken as a target code point set, and a new initial code point set is generated, wherein the new initial code point set comprises the j +1 th code point,
and if j +1 is less than n, making j equal to j +1, executing the code point set division process on the new initial code point set, and if j +1 is equal to n, using the new initial code point set as a target code point set.
Optionally, the reading the state of the state machine corresponding to the j +1 th code point in all the code points includes:
if the state machine state corresponding to the jth code point is a default state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the state of a state machine corresponding to the j +1 th code point is a default state;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is a connector code point, determining that the state of the state machine corresponding to the j +1 th code point is a connector state,
when the j +1 th code point is not any one of the modifier code point and the connector code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is not a modifier code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not a country region code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a connector state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
and when the j +1 th code point is not any one of the expression code point and the digital code point, the state machine state corresponding to the j-1 th code point is taken as a default state, and after the state machine state corresponding to the j code point is read again, the state machine state corresponding to the j +1 th code point is read.
Optionally, the determining, according to the state machine state corresponding to the jth code point and the state machine state corresponding to the j +1 th code point, whether the j +1 th code point belongs to the initial code point set includes:
if the state machine state corresponding to the jth code point is a default state, then:
when the j +1 th code point is an expression code point, a digital code point or a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point or a connector code point,
when the j +1 th code point is not any one of a modifier code point and a connector code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point,
when the j +1 th code point is not a modifier code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a connector state, then:
when the j +1 th code point is an expression code point or a digital code point, determining that the j +1 th code point belongs to the initial code point set,
and when the j +1 th code point is not any one of the expression code point and the digital code point, determining that the j +1 th code point does not belong to the initial code point set.
Optionally, the initial state of the state machine is a default state, and the method further includes:
when the first code point in all the code points is an expression code point, determining that the state of the state machine corresponding to the first code point is a character drawing state,
when the first code point is a digital code point, determining that the state of the state machine corresponding to the first code point is a quasi-drawing character state,
when the first code point is a country region code point, determining that the state of a state machine corresponding to the first code point is a country region state,
and when the first code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the first code point is a default state.
Optionally, after dividing all code points to obtain one or more target code point sets, the method further includes:
and determining the character type corresponding to the target code point set according to the code points in the target code point set.
In a second aspect, there is provided a character string length calculation apparatus, the apparatus including:
the acquisition module is used for acquiring all code points corresponding to the target character string;
the dividing module is used for dividing all the code points to obtain one or more target code point sets, each target code point set comprises one or more code points, each target code point set corresponds to one character, the character is a non-sketching character or a sketching character, each target code point set corresponding to the non-sketching character comprises one code point, and each target code point set corresponding to the sketching character comprises one or more code points;
a first determining module, configured to determine the number of the target code point sets as the length of the target character string.
Optionally, the all code points include one or more of a number code point, an expression code point, a country region code point, a modifier code point, and a connector code point.
Optionally, the number of all code points is n, where n is a positive integer, and the dividing module is configured to:
acquiring an initial code point set, wherein the initial code point set comprises a jth code point in all the code points, and the initial value of j is 1;
executing a code point set division process on the initial code point set, wherein the code point set division process comprises the following steps:
reading the state of a state machine corresponding to the (j + 1) th code point in all the code points;
determining whether the j +1 th code point belongs to the initial code point set according to the state of the state machine corresponding to the j th code point and the state of the state machine corresponding to the j +1 th code point;
adding the (j + 1) th code point into the initial code point set to obtain an updated initial code point set when the (j + 1) th code point belongs to the initial code point set,
if j +1< n and the j +1 th code point is not a country region code point, making j equal to j +1, and executing the code point set division process again on the updated initial code point set,
if j +1< n and the j +1 th code point is a country region code point, using the updated initial code point set as a target code point set, and generating a new code point set, where the new code point set includes the j +2 th code point, and j is j +2, executing the code point set division process on the new initial code point set,
if j +1 is equal to n, taking the updated initial code point set as a target code point set;
when the j +1 th code point does not belong to the initial code point set, the initial code point set is taken as a target code point set, and a new initial code point set is generated, wherein the new initial code point set comprises the j +1 th code point,
and if j +1 is less than n, making j equal to j +1, executing the code point set division process on the new initial code point set, and if j +1 is equal to n, using the new initial code point set as a target code point set.
Optionally, the dividing module is configured to:
if the state machine state corresponding to the jth code point is a default state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the state of a state machine corresponding to the j +1 th code point is a default state;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is a connector code point, determining that the state of the state machine corresponding to the j +1 th code point is a connector state,
when the j +1 th code point is not any one of the modifier code point and the connector code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is not a modifier code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not a country region code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a connector state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
and when the j +1 th code point is not any one of the expression code point and the digital code point, the state machine state corresponding to the j-1 th code point is taken as a default state, and after the state machine state corresponding to the j code point is read again, the state machine state corresponding to the j +1 th code point is read.
Optionally, the dividing module is configured to:
if the state machine state corresponding to the jth code point is a default state, then:
when the j +1 th code point is an expression code point, a digital code point or a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point or a connector code point,
when the j +1 th code point is not any one of a modifier code point and a connector code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point,
when the j +1 th code point is not a modifier code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a connector state, then:
when the j +1 th code point is an expression code point or a digital code point, determining that the j +1 th code point belongs to the initial code point set,
and when the j +1 th code point is not any one of the expression code point and the digital code point, determining that the j +1 th code point does not belong to the initial code point set.
Optionally, the initial state of the state machine is a default state, and the apparatus further includes a second determining module, configured to:
when the first code point in all the code points is an expression code point, determining that the state of the state machine corresponding to the first code point is a character drawing state,
when the first code point is a digital code point, determining that the state of the state machine corresponding to the first code point is a quasi-drawing character state,
when the first code point is a country region code point, determining that the state of a state machine corresponding to the first code point is a country region state,
and when the first code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the first code point is a default state.
Optionally, the apparatus further comprises:
and the third determining module is used for determining the character type corresponding to the target code point set according to the code points in the target code point set.
In a third aspect, there is provided a character string length calculation apparatus including: a processor and a memory.
The memory for storing a computer program, the computer program comprising program instructions;
the processor is configured to invoke the computer program to implement the method for calculating a string length according to any one of the first aspect.
In a fourth aspect, a computer storage medium is provided, which has instructions stored thereon, and when the instructions are executed by a processor, the method for calculating the character string length according to any one of the first aspect is realized.
The beneficial effect that technical scheme that this application provided brought includes:
and dividing all code points corresponding to the acquired target character string to obtain one or more target code point sets, and then determining the number of the target code point sets as the length of the target character string. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.
In addition, after the unicode coding specification is modified, when the character string length is calculated by using the character string length calculation method provided by the application, only the type of the code point in the state machine and the state machine state corresponding to the code point under each condition need to be updated, so that the maintenance operation is simplified. The state machine can run in various operating systems, and is wide in application range.
Drawings
Fig. 1 is a schematic flowchart of a method for calculating a length of a character string according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating another method for calculating a length of a character string according to an embodiment of the present application;
fig. 3 is a schematic diagram of a state machine according to read code points to perform state switching of the state machine according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a character string length calculation apparatus according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of another apparatus for calculating string length according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another apparatus for calculating string length according to an embodiment of the present application;
fig. 7 is a block diagram of a character string length calculation apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for calculating a length of a character string according to an embodiment of the present application. The method may be applied to a computer device. As shown in fig. 1, the method includes:
step 101, acquiring all code points corresponding to the target character string.
And 102, dividing all code points to obtain one or more target code point sets.
Each target code point set comprises one or more code points, and each target code point set corresponds to one character. The character is a non-graphemic character or a graphemic character. The target code point set corresponding to each non-graphed character comprises one code point, and the target code point set corresponding to each graphed character comprises one or more code points.
And 103, determining the number of the target code point sets as the length of the target character string.
In summary, in the method for calculating a length of a character string provided in the embodiment of the present application, one or more target code point sets are obtained by dividing all code points corresponding to an obtained target character string, and then the number of the target code point sets is determined as the length of the target character string. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.
Fig. 2 is a schematic flowchart of another method for calculating a length of a character string according to an embodiment of the present application. The method may be applied to a computer device. As shown in fig. 2, the method includes:
step 201, obtaining a target character string.
The target character string in the embodiment of the application comprises one or more characters, and the one or more characters comprise non-sketching characters and/or sketching characters. Optionally, the target character string may be a character string typed in an input box, a character string input through speech recognition, or any character string stored in a computer device, and the obtaining manner of the target character string is not limited in the embodiment of the present application.
And step 202, acquiring all code points corresponding to the target character string.
The code point in the embodiment of the application refers to the corresponding code point of each character in the unicode. All code points corresponding to the target character string comprise one or more of digital code points, expression code points, country region code points, modifier code points and connector code points. Each non-graphed character in the target string corresponds to a codepoint, and each graphed character corresponds to one or more codepoints.
For example, the case that the emoji corresponds to a plurality of code points may include the following cases:
in the first case: the grapher is composed of country region code points and country region code points, namely, two continuous country region code points correspond to one grapher. The emoji will be displayed as a national flag on a computer device. Different country regional code point combinations can display the national flags of different countries.
In the second case: the emoji is composed of an emoji point and one or more modifier points, namely one emoji point and one or more modifier points correspond to one emoji. Illustratively, the emoticon point a is displayed on the computer device as an engineer avatar. When the expression code point a corresponding to the engineer's head portrait is followed by a modifier code point b expressing gender girls, the expression code point b is displayed as a woman's head portrait on the computer device. When a code point a + b corresponding to a woman engineer head portrait is followed by a modifier code point c expressing brown color, the woman engineer head portrait with brown skin can be displayed on the computer equipment. The code points corresponding to the three sketches (including the engineer head portrait, the engineer head portrait and the engineer head portrait with brown skin) are respectively as follows: the code point corresponding to the head portrait of the engineer is a; the code point corresponding to the engineer head portrait is a + b; the code point corresponding to the engineer head portrait of the brown skin is a + b + c.
In the third case: the sketching character consists of a code point corresponding to the sketching character, a connector code point and a code point corresponding to the sketching character, namely, the code points corresponding to a plurality of sketching characters are connected through the connector code point and then can be displayed as a new sketching character on computer equipment. Referring to the second case, the code point corresponding to the emoji is a + b + modifier code point + a + b, which indicates that the emoji is a double-person photo of the engineer.
In a fourth case: the graphemes are mapped to one grapheme by a numeric code point + a modifier code point, i.e., a numeric code point and one or more modifier code points.
In the embodiment of the present application, "+" indicates that two code points located before and after the "+" are consecutive code points.
And 203, dividing all code points corresponding to the target character strings to obtain one or more target code point sets.
The number of all code points corresponding to the target character string is n, and n is a positive integer. Optionally, the implementation process of step 203 includes: and acquiring an initial code point set, wherein the initial code point set comprises the jth code point in all code points, and the initial value of j is 1. And executing a code point set dividing process on the initial code point set.
The code point set dividing process includes the following steps S1 to S4:
in step S1, the state machine state corresponding to the j +1 th code point of all code points corresponding to the target character string is read.
In the embodiment of the application, the code point corresponding to the target character string is read by the state machine, and the state of the state machine is switched according to the read code point. A state machine is essentially a piece of code. Optionally, the state machine state includes the following six states:
and the character drawing state is used for indicating that the character corresponding to the read code point is a character drawing and the read code point is an expression code point.
The quasi-graphed character state is used for indicating that the character corresponding to the read code point is probably the graphed character, and the read code point is the digital code point.
The country region state is used for indicating that the characters corresponding to the read code points can be pictograms, and the read code points are country region code points.
And the modifier state is used for indicating that the character corresponding to the read code point is a drawing character, and the read code point is a modifier code point.
And the connector state is used for indicating that the character corresponding to the read code point is the drawing character, and the read code point is the connector code point.
And the default state is used for indicating that the character corresponding to the read code point is a non-sketching character.
In the embodiment of the present application, the initial state of the state machine is a default state. And when the state machine reads the first code point in all the code points corresponding to the target character string and the first code point is an expression code point, determining that the state of the state machine corresponding to the first code point is a character drawing state. And when the first code point is a digital code point, determining that the state of the state machine corresponding to the first code point is a quasi-drawing character state. And when the first code point is the country region code point, determining that the state of the state machine corresponding to the first code point is the country region state. And when the first code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the first code point is a default state.
Alternatively, the implementation process of step S1 can be divided into the following five cases:
in the first case, if the state of the state machine corresponding to the jth code point is the default state, then:
and when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state.
And when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state.
And when the j +1 th code point is the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is the country region state.
And when the j +1 th code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a default state.
In the second case, if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
and when the j +1 th code point is the modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is the modifier state.
And when the j +1 th code point is the connector code point, determining that the state of the state machine corresponding to the j +1 th code point is the connector state.
And when the j +1 th code point is not any one of the modifier code point and the connector code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
In a third case, if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
and when the j +1 th code point is the modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is the modifier state.
And when the j +1 th code point is not the modifier code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
In a fourth case, if the state of the state machine corresponding to the jth code point is a state of a country region, then:
and when the j +1 th code point is the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is the country region state.
And when the j +1 th code point is not the country region code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
In the fifth case, if the state machine state corresponding to the jth code point is the connector state, then:
and when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state.
And when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state.
And when the j +1 th code point is not any one of the expression code point and the digital code point, the state machine state corresponding to the j-1 th code point is taken as the default state, and after the state machine state corresponding to the j code point is read again, the state machine state corresponding to the j +1 th code point is read.
Exemplarily, the initial state of the state machine is a default state, and fig. 3 is a schematic diagram of the state machine performing state switching according to a read code point according to the state machine provided in the embodiment of the present application. As shown in fig. 3, the state machine state pointed by the arrow is the state machine state corresponding to the currently read code point determined according to the state machine state corresponding to the previous code point. The initial state of the state machine is a default state.
In step S2, it is determined whether the j +1 th code point belongs to the initial code point set according to the state machine state corresponding to the j th code point and the state machine state corresponding to the j +1 th code point.
Alternatively, the implementation process of step S2 can be divided into the following five cases:
in the first case, if the state of the state machine corresponding to the jth code point is the default state, then:
and when the j +1 th code point is an expression code point, a digital code point or a national region code point, determining that the j +1 th code point belongs to the initial code point set.
And when the j +1 th code point is not any one of the expression code point, the digital code point and the country region code point, determining that the j +1 th code point does not belong to the initial code point set.
In the second case, if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
and when the j +1 th code point is a modifier code point or a connector code point, determining that the j +1 th code point belongs to the initial code point set.
And when the j +1 th code point is not any one of the modifier code point and the connector code point, determining that the j +1 th code point does not belong to the initial code point set.
In a third case, if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
and when the j +1 th code point is the modifier code point, determining that the j +1 th code point belongs to the initial code point set.
And when the j +1 th code point is not the modifier code point, determining that the j +1 th code point does not belong to the initial code point set.
In a fourth case, if the state of the state machine corresponding to the jth code point is a state of a country region, then:
and when the j +1 th code point is the country region code point, determining that the j +1 th code point belongs to the initial code point set.
And when the j +1 th code point is not the country region code point, determining that the j +1 th code point does not belong to the initial code point set.
In the fifth case, if the state machine state corresponding to the jth code point is the connector state, then:
and when the j +1 th code point is an expression code point or a digital code point, determining that the j +1 th code point belongs to the initial code point set.
And when the j +1 th code point is not any one of the expression code point and the digital code point, determining that the j +1 th code point does not belong to the initial code point set.
In step S3, when the j +1 th code point belongs to the initial code point set, adding the j +1 th code point to the initial code point set to obtain an updated initial code point set; if j +1 is less than n and the j +1 th code point is not a country region code point, making j equal to j +1, and executing a code point set dividing process again on the updated initial code point set; if j +1< n and the j +1 th code point is a country region code point, taking the updated initial code point set as a target code point set, wherein the code point set corresponds to a pictograph and generates a new code point set, the new code point set comprises a j +2 th code point, and j is j +2, and executing a code point set dividing process on the new initial code point set; and if j +1 is equal to n, taking the updated initial code point set as a target code point set, wherein the code point set corresponds to a emoji.
In step S4, when the j +1 th code point does not belong to the initial code point set, the initial code point set is used as a target code point set, where the target code point set may correspond to a emoji or a non-emoji, and the character type corresponding to the target code point set can be determined according to the code points in the target code point set, and a new initial code point set is generated, where the new initial code point set includes the j +1 th code point; if j +1 is less than n, making j equal to j +1, and executing code point set dividing process on the new initial code point set; and if j +1 is equal to n, taking the new initial code point set as a target code point set, wherein the code point set corresponds to a non-sketching character.
For example, in the embodiment of the present application, the implementation process of the step 203 is described by taking as an example that the code point corresponding to the target character string is U + 0031U +1F93E U +200D U + 2640U + FE0F U +0031, and the number of all code points corresponding to the target character string is 6. Wherein, U +0031 is a digital code point, U +1F93E is an expression code point, U +200D is a connector code point, U +2640 is an expression code point, U + FE0F is a modifier code point, and U +0031 is a digital code point.
1. The first code point U +0031 is read, which is a digital code point. After the state machine reads the code point, the state machine is determined to be in a quasi-drawing character state.
2. And acquiring an initial code point set, wherein the initial code point set comprises a first code point.
3. The second code point U +1F93E is read, which is an expression code point. Because the code point is not a modifier code point, after the state machine reads the code point, the state machine state corresponding to the first code point is taken as a default state, and the state machine state corresponding to the second code point is determined to be a sketch state. And the second code point does not belong to the initial set of code points where the first code point is located. At this time, the initial code point set including the first code point is used as a first target code point set, and the first target code point set corresponds to a non-sketching character. And generating a new initial code point set, wherein the new initial code point set comprises the second code point.
4. The third code point, U +200D, is read, which is a connector code point. After the state machine reads the code point, the state machine state is determined to be the connector state. And determining that the code point belongs to the initial code point set, and adding a third code point into the initial code point set to obtain an updated initial code point set, wherein the updated initial code point set comprises a second code point and a third code point.
5. The fourth code point U +2640, which is an expression code point, is read. After the state machine reads the code point, the state machine is determined to be the drawing character state. And determining that the code point belongs to an initial code point set, and adding a fourth code point into the initial code point set to obtain an updated initial code point set, wherein the updated initial code point set comprises a second code point, a third code point and a fourth code point.
6. The fifth code point U + FE0F, which is the modifier code point, is read. After the state machine reads the code point, the state machine state is determined to be the modifier state. And determining that the code point belongs to the initial code point set, and adding a fifth code point into the initial code point set to obtain an updated initial code point set, wherein the updated initial code point set comprises a second code point, a third code point, a fourth code point and a fifth code point.
7. The sixth code point U +0031, which is a numeric code point, is read. Since the code point is not any of the modifier code point and the connector code point. After the state machine reads the code point, the state machine state corresponding to the fifth code point is determined to be a default state, and the sixth code point does not belong to the initial code point set where the fifth code point is located. At this time, an initial code point set including a second code point, a third code point, a fourth code point and a fifth code point is used as a second target code point set, and the second target code point set corresponds to a emoji. And generating a new initial code point set, wherein the new initial code point set comprises a sixth code point. Since the target character string includes 6 code points in total, the initial code point set including the sixth code point is taken as the third target code point set, and the third target code point set corresponds to one non-sketching character.
In the embodiment of the application, the character type corresponding to the target code point set can be determined according to the code points in the target code point set, and the character type is a sketch character or a non-sketch character.
Referring to the above example, the code points corresponding to the target character string are U + 0031U +1F93E U +200D U + 2640U + FE0F U +0031, and three target code point sets are obtained by dividing all the code points corresponding to the target character string. Wherein the first code point set corresponds to a non-sketching character and comprises a code point. The second set of code points corresponds to a glyph, and the set of code points comprises four code points. The third code point set corresponds to a non-sketching character and comprises a code point.
And step 204, determining the number of the target code point sets as the length of the target character string.
And dividing code points corresponding to the characters in the target character string to obtain one or more target code point sets. Wherein each target code point set corresponds to a emoji or non-emoji. The number of target code point sets is taken as the length of the target character string, that is, the length of each character is 1. Referring to the example in step 203, the code point corresponding to the target character string is U + 0031U +1F93E U +200D U + 2640U + FE0F U +0031, and all the code points corresponding to the target character string are divided to obtain three target code point sets, so that the length of the target character string is 3.
It should be noted that, the order of the steps of the method for calculating the length of a character string provided in the embodiment of the present application may be appropriately adjusted, and the steps may also be increased or decreased according to the circumstances, and any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application shall be included in the protection scope of the present application, and therefore, the details are not described again.
In summary, in the method for calculating a length of a character string provided in the embodiment of the present application, one or more target code point sets are obtained by dividing all code points corresponding to an obtained target character string, and then the number of the target code point sets is determined as the length of the target character string. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.
In addition, after the unicode coding specification is modified, when the character string length is calculated by using the character string length calculation method provided by the embodiment of the application, only the type of the code point in the state machine and the state machine state corresponding to the code point under each condition need to be updated, so that the maintenance operation is simplified. The state machine can run in various operating systems, and is wide in application range.
Fig. 4 is a schematic structural diagram of a character string length calculation apparatus according to an embodiment of the present application. The device can be applied to computer equipment. As shown in fig. 4, the apparatus 40 includes:
an obtaining module 401, configured to obtain all code points corresponding to the target character string.
A dividing module 402, configured to divide all code points to obtain one or more target code point sets, where each target code point set includes one or more code points, each target code point set corresponds to a character, the character is a non-graphed character or a graphed character, the target code point set corresponding to each non-graphed character includes one code point, and the target code point set corresponding to each graphed character includes one or more code points.
A first determining module 403, configured to determine the number of target code point sets as the length of the target character string.
In summary, the character string length calculation apparatus provided in this embodiment of the present application obtains one or more target code point sets by dividing all code points corresponding to the target character string obtained by the obtaining module by the dividing module, and then determines the number of the target code point sets as the length of the target character string by the first determining module. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.
Optionally, all code points include one or more of numeric code points, emoticon code points, country region code points, modifier code points, and connector code points.
Optionally, the number of all code points is n, where n is a positive integer, and the dividing module 402 is configured to:
and acquiring an initial code point set, wherein the initial code point set comprises the jth code point in all the code points, and the initial value of j is 1.
Executing a code point set dividing process on the initial code point set, wherein the code point set dividing process comprises the following steps:
and reading the state of the state machine corresponding to the j +1 th code point in all the code points. And determining whether the j +1 th code point belongs to the initial code point set or not according to the state of the state machine corresponding to the j th code point and the state of the state machine corresponding to the j +1 th code point.
And when the j +1 th code point belongs to the initial code point set, adding the j +1 th code point into the initial code point set to obtain an updated initial code point set.
And if j +1< n and the j +1 th code point is not the country region code point, making j equal to j +1, and executing the code point set dividing process again on the updated initial code point set. And if j +1< n and the j +1 th code point is a country region code point, taking the updated initial code point set as a target code point set and generating a new code point set, wherein the new code point set comprises a j +2 th code point, and j is equal to j +2, and executing a code point set dividing process on the new initial code point set. And if j +1 is equal to n, taking the updated initial code point set as a target code point set.
And when the j +1 th code point does not belong to the initial code point set, taking the initial code point set as a target code point set and generating a new initial code point set, wherein the new initial code point set comprises the j +1 th code point.
And if j +1 is less than n, making j equal to j +1, and executing a code point set dividing process on the new initial code point set. And if j +1 is equal to n, taking the new initial code point set as a target code point set.
Optionally, the dividing module 402 is configured to:
if the state machine state corresponding to the jth code point is a default state, then:
and when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state. And when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state. And when the j +1 th code point is the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is the country region state. And when the j +1 th code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a default state.
If the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
and when the j +1 th code point is the modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is the modifier state. And when the j +1 th code point is the connector code point, determining that the state of the state machine corresponding to the j +1 th code point is the connector state. And when the j +1 th code point is not any one of the modifier code point and the connector code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
If the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
and when the j +1 th code point is the modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is the modifier state. And when the j +1 th code point is not the modifier code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
If the state machine state corresponding to the jth code point is a state of a country region, then:
and when the j +1 th code point is the country region code point, determining that the state of the state machine corresponding to the j +1 th code point is the country region state. And when the j +1 th code point is not the country region code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point.
If the state machine state corresponding to the jth code point is a connector state, then:
and when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state. And when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state. And when the j +1 th code point is not any one of the expression code point and the digital code point, the state machine state corresponding to the j-1 th code point is taken as the default state, and after the state machine state corresponding to the j code point is read again, the state machine state corresponding to the j +1 th code point is read.
Optionally, the dividing module 402 is configured to:
if the state machine state corresponding to the jth code point is a default state, then:
and when the j +1 th code point is an expression code point, a digital code point or a national region code point, determining that the j +1 th code point belongs to the initial code point set. And when the j +1 th code point is not any one of the expression code point, the digital code point and the country region code point, determining that the j +1 th code point does not belong to the initial code point set.
If the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
and when the j +1 th code point is a modifier code point or a connector code point, determining that the j +1 th code point belongs to the initial code point set. And when the j +1 th code point is not any one of the modifier code point and the connector code point, determining that the j +1 th code point does not belong to the initial code point set.
If the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
and when the j +1 th code point is the modifier code point, determining that the j +1 th code point belongs to the initial code point set. And when the j +1 th code point is not the modifier code point, determining that the j +1 th code point does not belong to the initial code point set.
If the state machine state corresponding to the jth code point is a state of a country region, then:
and when the j +1 th code point is the country region code point, determining that the j +1 th code point belongs to the initial code point set. And when the j +1 th code point is not the country region code point, determining that the j +1 th code point does not belong to the initial code point set.
If the state machine state corresponding to the jth code point is a connector state, then:
and when the j +1 th code point is an expression code point or a digital code point, determining that the j +1 th code point belongs to the initial code point set. And when the j +1 th code point is not any one of the expression code point and the digital code point, determining that the j +1 th code point does not belong to the initial code point set.
Optionally, the initial state of the state machine is a default state, as shown in fig. 5, the apparatus 40 further includes a second determining module 404, where the second determining module 404 is configured to:
and when the first code point in all the code points is the expression code point, determining that the state of the state machine corresponding to the first code point is the character drawing state. And when the first code point is the digital code point, determining that the state of the state machine corresponding to the first code point is the quasi-drawing character state. And when the first code point is the country region code point, determining that the state of the state machine corresponding to the first code point is the country region state. And when the first code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the first code point is a default state.
Optionally, as shown in fig. 6, the apparatus 40 further includes:
a third determining module 405, configured to determine, according to code points in the target code point set, a character type corresponding to the target code point set.
In summary, the character string length calculation apparatus provided in this embodiment of the present application obtains one or more target code point sets by dividing all code points corresponding to the target character string obtained by the obtaining module by the dividing module, and then determines the number of the target code point sets as the length of the target character string by the first determining module. Because each target code point set corresponds to one character in the target character string, the character can be a drawing character or a non-drawing character, the number of the target code point sets is determined as the length of the target character string, namely, the lengths of the drawing character and the non-drawing character are both determined as 1, and the accuracy of character display can be further improved.
In addition, after the unicode encoding specification is modified, when the character string length is calculated by using the character string length calculating device provided by the embodiment of the application, only the type of the code point in the state machine and the state machine state corresponding to the code point under each condition need to be updated, so that the maintenance operation is simplified. The state machine can run in various operating systems, and is wide in application range.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The embodiment of the application provides a character string length calculation device, which comprises: a processor and a memory.
A memory for storing a computer program, the computer program comprising program instructions; and the processor is used for calling the computer program to realize the character string length calculation method shown in the figure 1 or the figure 2.
The embodiment of the application also provides a computer storage medium, and the computer storage medium stores instructions, and when the instructions are executed by a processor, the method for calculating the character string length as shown in fig. 1 or fig. 2 is realized.
Illustratively, fig. 7 is a block diagram of a character string length calculation apparatus provided in an embodiment of the present application. The string length calculating means may be a computer device 700.
Generally, the computer device 700 includes: a processor 701 and a memory 702.
The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement a string length calculation method as provided by method embodiments herein.
In some embodiments, the computer device 700 may also optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 706, an audio circuit 707, a positioning component 708, and a power source 709.
The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this application.
The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 705 may be one, providing the front panel of the computer device 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the computer device 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the computer device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-emitting diode), or the like.
The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the computer device 700 and the rear camera is disposed on the back of the computer device. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and located at different locations on the computer device 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.
The Location component 708 is used to locate the current geographic Location of the computer device 700 for navigation or LBS (Location Based Service). The positioning component 708 can be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.
The power supply 709 is used to supply power to the various components of the computer device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 709 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the computer device 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.
The acceleration sensor 711 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the computer apparatus 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 712 may detect a body direction and a rotation angle of the computer device 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the user with respect to the computer device 700. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 713 may be disposed on a side bezel of computer device 700 and/or underneath touch display screen 705. When the pressure sensor 713 is disposed on a side frame of the computer device 700, a user's holding signal to the computer device 700 may be detected, and the processor 701 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the computer device 700. When a physical key or vendor Logo is provided on the computer device 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.
The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.
A proximity sensor 716, also known as a distance sensor, is typically disposed on a front panel of the computer device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the computer device 700. In one embodiment, the processor 701 controls the touch display screen 705 to switch from the bright screen state to the dark screen state when the proximity sensor 716 detects that the distance between the user and the front surface of the computer device 700 is gradually decreased; when the proximity sensor 716 detects that the distance between the user and the front of the computer device 700 is gradually increased, the processor 701 controls the touch display 705 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration illustrated in FIG. 7 is not intended to be limiting of the computer device 700 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.
The above description is only exemplary of the present application and is not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (10)

1. A method for calculating a length of a character string, the method comprising:
acquiring all code points corresponding to the target character string;
dividing all the code points to obtain one or more target code point sets, wherein each target code point set comprises one or more code points, each target code point set corresponds to one character, the character is non-sketching character or sketching character, the target code point set corresponding to each non-sketching character comprises one code point, and the target code point set corresponding to each sketching character comprises one or more code points;
and determining the number of the target code point sets as the length of the target character string.
2. The method of claim 1, wherein the all code points comprise one or more of numeric code points, expression code points, country region code points, modifier code points, and connector code points.
3. The method according to claim 1 or 2, wherein the number of all code points is n, n is a positive integer, and the dividing all code points to obtain one or more target code point sets comprises:
acquiring an initial code point set, wherein the initial code point set comprises a jth code point in all the code points, and the initial value of j is 1;
executing a code point set division process on the initial code point set, wherein the code point set division process comprises the following steps:
reading the state of a state machine corresponding to the (j + 1) th code point in all the code points;
determining whether the j +1 th code point belongs to the initial code point set according to the state of the state machine corresponding to the j th code point and the state of the state machine corresponding to the j +1 th code point;
adding the (j + 1) th code point into the initial code point set to obtain an updated initial code point set when the (j + 1) th code point belongs to the initial code point set,
if j +1< n and the j +1 th code point is not a country region code point, making j equal to j +1, and executing the code point set division process again on the updated initial code point set,
if j +1< n and the j +1 th code point is a country region code point, using the updated initial code point set as a target code point set, and generating a new code point set, where the new code point set includes the j +2 th code point, and j is j +2, executing the code point set division process on the new initial code point set,
if j +1 is equal to n, taking the updated initial code point set as a target code point set;
when the j +1 th code point does not belong to the initial code point set, the initial code point set is taken as a target code point set, and a new initial code point set is generated, wherein the new initial code point set comprises the j +1 th code point,
if j +1< n, let j equal j +1, perform the code point set division procedure on the new initial code point set,
and if j +1 is equal to n, taking the new initial code point set as a target code point set.
4. The method according to claim 3, wherein said reading the state machine state corresponding to the j +1 th code point of all the code points comprises:
if the state machine state corresponding to the jth code point is a default state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the state of a state machine corresponding to the j +1 th code point is a default state;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is a connector code point, determining that the state of the state machine corresponding to the j +1 th code point is a connector state,
when the j +1 th code point is not any one of the modifier code point and the connector code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
when the j +1 th code point is a modifier code point, determining that the state of the state machine corresponding to the j +1 th code point is a modifier state,
when the j +1 th code point is not a modifier code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the state of the state machine corresponding to the j +1 th code point is a country region state,
when the j +1 th code point is not a country region code point, taking the state machine state corresponding to the j th code point as a default state, and reading the state machine state corresponding to the j +1 th code point;
if the state machine state corresponding to the jth code point is a connector state, then:
when the (j + 1) th code point is an expression code point, determining that the state of the state machine corresponding to the (j + 1) th code point is a character drawing state,
when the j +1 th code point is a digital code point, determining that the state of the state machine corresponding to the j +1 th code point is a quasi-drawing character state,
and when the j +1 th code point is not any one of the expression code point and the digital code point, the state machine state corresponding to the j-1 th code point is taken as a default state, and after the state machine state corresponding to the j code point is read again, the state machine state corresponding to the j +1 th code point is read.
5. The method of claim 4, wherein the determining whether the j +1 th code point belongs to the initial code point set according to the state machine state corresponding to the j-th code point and the state machine state corresponding to the j +1 th code point comprises:
if the state machine state corresponding to the jth code point is a default state, then:
when the j +1 th code point is an expression code point, a digital code point or a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not any one of an expression code point, a digital code point and a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a pictograph state or a modifier state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point or a connector code point,
when the j +1 th code point is not any one of a modifier code point and a connector code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a quasi-drawing character state, then:
determining that the j +1 th code point belongs to the initial code point set when the j +1 th code point is a modifier code point,
when the j +1 th code point is not a modifier code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a state of a country region, then:
when the j +1 th code point is a country region code point, determining that the j +1 th code point belongs to the initial code point set,
when the j +1 th code point is not a country region code point, determining that the j +1 th code point does not belong to the initial code point set;
if the state machine state corresponding to the jth code point is a connector state, then:
when the j +1 th code point is an expression code point or a digital code point, determining that the j +1 th code point belongs to the initial code point set,
and when the j +1 th code point is not any one of the expression code point and the digital code point, determining that the j +1 th code point does not belong to the initial code point set.
6. The method of claim 4 or 5, wherein the state machine initial state is a default state, the method further comprising:
when the first code point in all the code points is an expression code point, determining that the state of the state machine corresponding to the first code point is a character drawing state,
when the first code point is a digital code point, determining that the state of the state machine corresponding to the first code point is a quasi-drawing character state,
when the first code point is a country region code point, determining that the state of a state machine corresponding to the first code point is a country region state,
and when the first code point is not any one of the expression code point, the digital code point and the country region code point, determining that the state of the state machine corresponding to the first code point is a default state.
7. The method of claim 4 or 5, wherein after dividing all code points to obtain one or more target code point sets, the method further comprises:
and determining the character type corresponding to the target code point set according to the code points in the target code point set.
8. An apparatus for calculating a character string length, the apparatus comprising:
the acquisition module is used for acquiring all code points corresponding to the target character string;
the dividing module is used for dividing all the code points to obtain one or more target code point sets, each target code point set comprises one or more code points, each target code point set corresponds to one character, the character is a non-sketching character or a sketching character, each target code point set corresponding to the non-sketching character comprises one code point, and each target code point set corresponding to the sketching character comprises one or more code points;
a first determining module, configured to determine the number of the target code point sets as the length of the target character string.
9. A character string length calculation apparatus, comprising: a processor and a memory;
the memory for storing a computer program, the computer program comprising program instructions;
the processor is configured to invoke the computer program to implement the character string length calculating method according to any one of claims 1 to 7.
10. A computer storage medium having stored thereon instructions which, when executed by a processor, carry out the string length calculation method according to any one of claims 1 to 7.
CN202010152674.9A 2020-03-06 2020-03-06 Character string length calculating method and device and computer storage medium Active CN111339735B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010152674.9A CN111339735B (en) 2020-03-06 2020-03-06 Character string length calculating method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010152674.9A CN111339735B (en) 2020-03-06 2020-03-06 Character string length calculating method and device and computer storage medium

Publications (2)

Publication Number Publication Date
CN111339735A true CN111339735A (en) 2020-06-26
CN111339735B CN111339735B (en) 2023-06-20

Family

ID=71185957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010152674.9A Active CN111339735B (en) 2020-03-06 2020-03-06 Character string length calculating method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN111339735B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461054A (en) * 2014-12-16 2015-03-25 飞天诚信科技股份有限公司 String length limited input device and string length limited input method
CN109933751A (en) * 2019-03-20 2019-06-25 腾讯科技(深圳)有限公司 Graphic rendering method, apparatus, computer readable storage medium and computer equipment
CN110134920A (en) * 2018-02-02 2019-08-16 中兴通讯股份有限公司 Draw the compatible display methods of text, device, terminal and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461054A (en) * 2014-12-16 2015-03-25 飞天诚信科技股份有限公司 String length limited input device and string length limited input method
CN110134920A (en) * 2018-02-02 2019-08-16 中兴通讯股份有限公司 Draw the compatible display methods of text, device, terminal and computer readable storage medium
CN109933751A (en) * 2019-03-20 2019-06-25 腾讯科技(深圳)有限公司 Graphic rendering method, apparatus, computer readable storage medium and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YVESCHEUNG: "一个能在字符串中识别出 Emoji 的简单工具 (支持JavaScript和Java)", Retrieved from the Internet <URL:《https://juejin.cn/post/6844903896419155976》> *

Also Published As

Publication number Publication date
CN111339735B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN110502954B (en) Video analysis method and device
CN110022489B (en) Video playing method, device and storage medium
CN112907725B (en) Image generation, training of image processing model and image processing method and device
CN109977775B (en) Key point detection method, device, equipment and readable storage medium
CN110827820B (en) Voice awakening method, device, equipment, computer storage medium and vehicle
CN108734662B (en) Method and device for displaying icons
CN110675473B (en) Method, device, electronic equipment and medium for generating GIF dynamic diagram
CN109754439B (en) Calibration method, calibration device, electronic equipment and medium
CN109117466B (en) Table format conversion method, device, equipment and storage medium
CN110619614A (en) Image processing method and device, computer equipment and storage medium
CN111931712B (en) Face recognition method, device, snapshot machine and system
CN111354378B (en) Voice endpoint detection method, device, equipment and computer storage medium
CN110853124B (en) Method, device, electronic equipment and medium for generating GIF dynamic diagram
CN112882094B (en) First-arrival wave acquisition method and device, computer equipment and storage medium
CN111128115B (en) Information verification method and device, electronic equipment and storage medium
CN111145723B (en) Method, device, equipment and storage medium for converting audio
CN111063372B (en) Method, device and equipment for determining pitch characteristics and storage medium
CN111339735B (en) Character string length calculating method and device and computer storage medium
CN112399080A (en) Video processing method, device, terminal and computer readable storage medium
CN107992230B (en) Image processing method, device and storage medium
CN111797754A (en) Image detection method, device, electronic equipment and medium
CN110659609B (en) Fingerprint matching method and device, electronic equipment and medium
CN110045999B (en) Method, device, terminal and storage medium for drawing assembly
CN113345478B (en) Player time acquisition method, device, storage medium and player
CN111541742B (en) Data transmission method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210113

Address after: 511442 3108, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 511446 24 / F, building B-1, Wanda Plaza, Panyu District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200626

Assignee: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

Assignor: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Contract record no.: X2021440000054

Denomination of invention: String length calculation method and device, computer storage medium

License type: Common License

Record date: 20210208

GR01 Patent grant
GR01 Patent grant