CN111444716A

CN111444716A - Title word segmentation method, terminal and computer readable storage medium

Info

Publication number: CN111444716A
Application number: CN202010235425.6A
Authority: CN
Inventors: 李松
Original assignee: Shenzhen Micropurchase Technology Co ltd
Current assignee: Shenzhen Micropurchase Technology Co ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-24

Abstract

The invention discloses a title word segmentation method, a terminal and a computer readable storage medium, wherein the title word segmentation method comprises the following steps: acquiring a title text input by a user, and filtering the title text according to a preset rule to generate a filtered text; calling a word segmentation interface to send the filtered text to an external server corresponding to the word segmentation interface, and receiving a first word segmentation result generated by the external server according to the filtered text; and storing and displaying the first segmentation result. The invention can improve the efficiency of the user in the process of performing title word segmentation and improve the use experience of the user.

Description

Title word segmentation method, terminal and computer readable storage medium

Technical Field

The invention relates to the technical field of data processing, in particular to a title word segmentation method, a terminal and a computer readable storage medium.

Background

At present, in the e-commerce field, simple word segmentation software is adopted by a merchant when segmenting a commodity title, so that an ideal word segmentation effect cannot be achieved, and when the merchant edits the commodity title, the merchant basically cannot remember what name a certain commodity is proper because the merchant has a lot of commodities, the commodity title needs to be checked, then the commodity name is manually input, and the efficiency of the merchant in performing title segmentation is low.

Therefore, there is a need to provide a method for segmenting words in a title to solve the above technical problems.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a title word segmentation method, a terminal and a computer readable storage medium, and aims to solve the technical problem of low efficiency when a merchant performs title word segmentation.

In order to achieve the above object, the present invention provides a title word segmentation method, including:

acquiring a title text input by a user, and filtering the title text according to a preset rule to generate a filtered text;

calling a word segmentation interface to send the filtered text to an external server corresponding to the word segmentation interface, and receiving a first word segmentation result generated by the external server according to the filtered text;

and storing and displaying the first segmentation result.

Preferably, the step of obtaining the title text input by the user, and filtering the title text according to a preset rule to generate a filtered text includes:

acquiring a title text input by a user, and judging the type of the title text;

if the type of the title text is digital information, judging whether a second word segmentation result corresponding to the title text exists in a preset database according to the preset database;

if a second word segmentation result corresponding to the title text exists in the preset database, displaying the second word segmentation result on a user interface;

after the step of obtaining the title text input by the user and judging the type of the title text, the method further comprises the following steps:

and if the type of the title text is character information, filtering the title text according to a preset rule to generate a filtered text.

Preferably, before the step of obtaining the title text input by the user and determining the type of the title text, the method further includes:

acquiring a title text input by a user, and judging whether the number of characters of the title text is greater than a preset number of characters;

if the number of characters of the title text is less than or equal to a preset number of characters, executing: the step of judging whether a second word segmentation result corresponding to the title text exists in a preset database or not according to the preset database;

if the number of the characters of the title text is greater than the preset number of the characters, executing: and the step of obtaining the title text input by the user and judging the type of the title text.

acquiring a title text input by a user, performing text recognition on the title text, and confirming sensitive characters in the title text;

and deleting the sensitive characters in the title text to generate a filtered text.

Preferably, the step of deleting the sensitive characters in the header text and generating a filtered text includes:

deleting the sensitive characters in the title text, detecting the grammar of the filtered text, and judging whether the grammar of the filtered text conforms to a preset grammar rule;

and if the grammar of the filtered text does not accord with the preset grammar rule, correcting the grammar of the filtered text through a preset correction algorithm to generate the filtered text.

Preferably, after the step of calling a segmentation interface to send the filtered text to an external server corresponding to the segmentation interface and receiving a first segmentation result generated by the external server according to the filtered text, the method includes:

storing the first segmentation result into a cache queue;

and storing the first segmentation result in the cache queue into a preset text file according to a single-thread sequence.

Preferably, after the step of storing and displaying the first segmentation result, the method further comprises:

obtaining the semanteme of each phrase of the first word segmentation result;

and generating and displaying similar meaning word phrases with similar semantemes with the phrases.

and associating and combining all the word groups in the first word segmentation result to generate a title abbreviation.

The present invention also provides a terminal comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the title word segmentation method as described above.

The present invention also provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the title word segmentation method as described above.

According to the technical scheme, the title text input by a user is obtained, and the title text is filtered according to a preset rule to generate a filtered text; calling a word segmentation interface to send the filtered text to an external server corresponding to the word segmentation interface, and receiving a first word segmentation result generated by the external server according to the filtered text; and storing and displaying the first segmentation result. The efficiency of the user in performing the title word segmentation can be improved.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a title word segmentation method according to the present invention;

FIG. 3 is a flowchart illustrating a detailed process of step S100 according to a first embodiment of the title word segmentation method of the present invention;

FIG. 4 is a flowchart illustrating a third embodiment of a title word segmentation method according to the present invention;

FIG. 5 is a flowchart illustrating a detailed process of step S100 according to a first embodiment of the title word segmentation method of the present invention;

FIG. 6 is a flowchart illustrating a flow of step S410 in the fourth embodiment of the title word segmentation method according to the present invention;

FIG. 7 is a flowchart illustrating a sixth exemplary embodiment of a title word segmentation method according to the present invention;

FIG. 8 is a flowchart illustrating a seventh exemplary embodiment of a title word segmentation method according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The title word segmentation method related to the embodiment of the invention is mainly applied to a terminal, and the terminal can be a device with display and processing functions, such as a PC (personal computer), a portable computer, a mobile terminal and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a terminal structure according to an embodiment of the present invention. In the embodiment of the present invention, the terminal may include a processor 1001 (e.g., a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface); the memory 1005 may be a high-speed RAM memory, or may be a non-volatile memory (e.g., a magnetic disk memory), and optionally, the memory 1005 may be a storage device independent of the processor 1001.

Those skilled in the art will appreciate that the hardware configuration shown in fig. 1 does not constitute a limitation of the apparatus, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

With continued reference to fig. 1, the memory 1005 of fig. 1, which is one type of computer-readable storage medium, may include an operating system, a network communication module, and a title tokenization program.

In fig. 1, the network communication module is mainly used for connecting to a server and performing data communication with the server; and the processor 1001 may call the title segmentation program stored in the memory 1005 and perform the steps of the title segmentation method:

and storing and displaying the first segmentation result.

Further, the processor 1001 may call a title word segmentation program stored in the memory 1005, and perform the steps of:

acquiring a title text input by a user, and judging the type of the title text;

storing the first segmentation result into a cache queue;

obtaining the semanteme of each phrase of the first word segmentation result;

Based on the hardware structure of the terminal, the invention provides various embodiments of the title word segmentation method.

The invention provides a title word segmentation method.

Referring to fig. 2, in the first embodiment of the present invention, a title word segmentation method includes the following steps:

step S100, acquiring a title text input by a user, and filtering the title text according to a preset rule to generate a filtered text;

in this embodiment, the user may implement the title word segmentation method through a preset application on the terminal or a preset applet in the application, and when the user needs to perform title word segmentation, the application or the applet may be opened and a title text may be input. The title text may be a sentence describing the commodity information, for example, the entered title text may be "marine men's eau de toilette 100 ml".

In this embodiment, the filtering process on the title text may be to filter sensitive characters in the title or a character form that does not meet preset regulations, where the sensitive characters may be special symbols, spaces, line feed characters, and the like, and the character form that does not meet the preset regulations may be a traditional/simplified form of english letters large/small-case and chinese characters. And generating a filtered text which accords with the word segmentation processing of the external terminal by performing the filtering processing on the title text. For example: the marine male light perfume is 100ml, only numbers, Chinese characters and English characters are reserved in the filtering process, and the marine male light perfume is 100 ml. The filtered text may be a string of words without any punctuation marks and spaces.

Step S110, calling a word segmentation interface to send the filtered text to an external server corresponding to the word segmentation interface, and receiving a first word segmentation result generated by the external server according to the filtered text;

in this embodiment, the filtered text may be sent to an external server by calling a segmentation interface provided by the airy cloud or the Tencent cloud, so as to perform segmentation on the filtered text by the external server to obtain a first segmentation result, and then receive the first segmentation result sent by the segmentation interface.

Optionally, the text information may also be participled by calling an IK participle library.

And step S120, storing and displaying the first word segmentation result.

In this embodiment, the first segmentation result is a plurality of phrases, and a user may view the plurality of phrases through a user display interface.

Preferably, the user can click the phrase to be combined through the user display interface.

In this embodiment, the user may obtain the first word segmentation result by inputting the title text through an application on the terminal or an applet in the application, so that the efficiency of the user in segmenting the title text is improved.

Further, a second embodiment is proposed based on the first embodiment, and referring to fig. 3, in this embodiment, the step S100 includes:

step S200, acquiring a title text input by a user, and judging the type of the title text;

step S210, if the type of the title text is digital information, judging whether a second word segmentation result corresponding to the title text exists in a preset database according to the preset database;

step S220, if a second word segmentation result corresponding to the title text exists in the preset database, displaying the second word segmentation result on a user interface;

step S230, if the second segmentation result corresponding to the title text does not exist in the preset database, generating and displaying a prompt message that the second segmentation result corresponding to the title text cannot be queried;

in this embodiment, the type of the title text input by the user may be digital information, which is a number corresponding to a title of the product, for example: the number corresponding to "marine men's light perfume 100 ml" is "100", and specifically, the number may be obtained by presetting a database in a memory, wherein each commodity title and the number corresponding to each commodity title are stored in the preset database, and the database further stores the second segmentation result corresponding to each commodity title. When a user inputs numbers, matching query can be directly carried out in the preset database according to the numbers, when a second segmentation result corresponding to the numbers exists in the preset database, the second segmentation result is directly displayed on a user interface, and when the second segmentation result corresponding to the title text does not exist in the preset database, prompt information which cannot be queried is generated and displayed to remind the user that searching cannot be carried out.

step S240, if the type of the title text is text information, filtering the title text according to a preset rule to generate a filtered text.

In this embodiment, if the type of the title text is text information, it may be determined that the information input by the user is the title text, and at this time: and the step of acquiring the title text input by the user.

In the embodiment, by judging the type of the information input by the user, when the user inputs digital information, the second segmentation result is directly inquired according to the preset database, so that the response speed can be improved, and the interface can be prevented from being invoked maliciously by the user.

Further, a third embodiment is proposed based on the second embodiment, and in this embodiment, referring to fig. 4, before the step S200, the method further includes:

step S300, acquiring a title text input by a user, and judging whether the number of characters of the title text is greater than a preset number of characters;

if the number of characters of the title text is less than or equal to the preset number of characters, executing step S210;

if the number of characters of the message is greater than the preset number of characters, step S200 is executed.

In this embodiment, when the user inputs the caption text, it may be determined whether the number of characters of the caption text is greater than a preset number of characters, the preset number of characters may be set to be small, and when the number of characters of the caption text is less than or equal to the preset number of characters, the caption text input by the user may be considered as a number or a simpler caption text. When a user inputs a number, matching query can be directly carried out in the preset database according to the number, when a second segmentation result corresponding to the number exists in the preset database, the second segmentation result is directly displayed on a user interface, and when the second segmentation result corresponding to the title text does not exist in the preset database or when the information input by the user is a simpler title text, prompt information which cannot be queried is generated and displayed to remind the user that searching cannot be carried out.

In this embodiment, by determining whether the number of characters of the title text input by the user is greater than the preset number of characters, when the number of characters of the title text input by the user is less than or equal to the preset number of characters, the title text input by the user may be considered to be a number or a simpler title text, the number may be queried through a preset database and a second segmentation result may be returned, so as to increase the response speed, where the simpler title text may not be searched, so as to prevent the user from entering a large number of simple title texts without segmentation and occupying server resources.

Further, a fourth embodiment is proposed based on the first embodiment, and in this embodiment, referring to fig. 5, the step S100 includes:

step S400, acquiring a title text input by a user, performing text recognition on the title text, and confirming sensitive characters in the title text;

and step S410, deleting the sensitive characters in the title text to generate a filtered text.

In this embodiment, text recognition may be performed on the title text, and a sensitive character in the title text is confirmed, where the sensitive character includes a special symbol, a space, a line break and the like, and the sensitive character further includes an undesirable character related to politics and the like, and a filtered text is generated by deleting the sensitive character in the title text, which is not only beneficial to performing subsequent word segmentation by calling a word segmentation interface, but also more standard for the filtered text, and improves the use experience of a user.

Further, a fifth embodiment is proposed based on the fourth embodiment, and in this embodiment, referring to fig. 6, the step S410 includes:

step S500, deleting the sensitive characters in the title text, detecting the grammar of the filtered text, and judging whether the grammar of the filtered text conforms to a preset grammar rule;

step S510, if the grammar of the filtered text does not accord with the preset grammar rule, correcting the grammar of the filtered text through a preset correction algorithm to generate a filtered text;

and if the grammar of the filtered text conforms to the preset grammar rule, not processing.

In this embodiment, the grammar of the header text may be eliminated by deleting the sensitive characters in the header text, and at this time, by detecting the grammar of the filtered text, it is determined whether the grammar of the filtered text conforms to a preset grammar rule, for example: continuous English or continuous numbers (this is) exist in the original title text, and a string of letters or a string of numbers (this is) is obtained through filtering processing. At the moment, the grammar of the filtered text is corrected through a preset correction algorithm, the problem of grammar error caused by filtering is solved through correction, the grammar problem possibly exists in the title text input by the user, and the user grammar error problem is solved through the preset correction algorithm.

Further, a sixth embodiment is proposed based on the first embodiment, and in this embodiment, referring to fig. 7, after step S110, the method includes:

step S600, storing the first segmentation result into a cache queue;

step S610, storing the first segmentation result in the cache queue into a preset text file according to a single thread sequence.

In this embodiment, the preset text file may be a preset txt text file, and the first segmentation result in the cache queue is stored in the txt text file according to a single-thread sequence, where the txt text file may be associated with a database of an IK segmenter.

Further, a seventh embodiment is proposed based on the first embodiment, and in this embodiment, referring to fig. 8, after the step S120, the method includes:

step S700, obtaining the semanteme of each phrase of the first word segmentation result;

step S710, generating similar meaning word phrases similar to the semanteme of each phrase and displaying the similar meaning word phrases.

In this embodiment, in addition to displaying each phrase of the first segmentation result, the user display interface may also identify the semantics of each phrase, generate a similar-meaning word phrase similar to the semantics of each phrase, and display the similar-meaning word phrase on the user interface.

Further, an eighth embodiment is proposed based on the first embodiment, and in this embodiment, after the step S120, the method includes:

In this embodiment, the terminal may associate and combine the word groups to generate a title abbreviation in addition to displaying each word group corresponding to the first word segmentation result on a user interface.

Preferably, the user can also select each phrase manually, and the terminal generates a title abbreviation according to each word group selected by the user and the corresponding selection sequence. After obtaining the first word segmentation result, the user does not need to manually record the first word segmentation result and then combine the first word segmentation result to obtain the title abbreviation, and the title abbreviation can be directly obtained through the terminal.

In addition, the invention also provides a computer readable storage medium.

The computer readable storage medium of the present invention stores a title segmentation program, wherein when the title segmentation program is executed by a processor, the steps of the title segmentation method are implemented.

The method implemented when the title segmentation program is executed may refer to each embodiment of the title segmentation method of the present invention, and will not be described herein again.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A title word segmentation method is characterized by comprising the following steps:

and storing and displaying the first segmentation result.

2. The title word segmentation method of claim 1, wherein the step of obtaining the title text input by the user, and filtering the title text according to a preset rule to generate a filtered text comprises:

acquiring a title text input by a user, and judging the type of the title text;

3. The title word segmentation method of claim 2, wherein the step of obtaining the title text input by the user and determining the type of the title text is preceded by the step of:

4. The title word segmentation method of claim 1, wherein the step of obtaining the title text input by the user, and filtering the title text according to a preset rule to generate a filtered text comprises:

5. The title segmentation method of claim 4, wherein the step of deleting the sensitive characters in the title text to generate a filtered text comprises:

6. The title segmentation method of claim 1, wherein the step of invoking a segmentation interface to send the filtered text to an external server corresponding to the segmentation interface and receiving a first segmentation result generated by the external server according to the filtered text comprises:

storing the first segmentation result into a cache queue;

7. The title segmentation method of claim 1, wherein the step of storing and displaying the first segmentation result is followed by:

obtaining the semanteme of each phrase of the first word segmentation result;

8. The title segmentation method of claim 1, wherein the step of storing and displaying the first segmentation result is followed by:

9. A terminal comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, performs the steps of the title tokenization method according to any one of claims 1 to 8.

10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the title segmentation method according to any one of claims 1 to 8.