CN111339751A

CN111339751A - Text keyword processing method, device and equipment

Info

Publication number: CN111339751A
Application number: CN202010412802.9A
Authority: CN
Inventors: 刘凡
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2020-06-26

Abstract

The embodiment of the specification provides a text keyword processing method, a text keyword processing device and text keyword processing equipment. The method comprises the following steps: performing word segmentation processing on a sentence text to be processed to obtain word segments in the semantic text and performing part-of-speech tagging; then, according to the part of speech of the participle, analyzing the syntactic dependency relationship among the participles, thereby obtaining a participle sample corresponding to each participle; and finally, taking the word segmentation sample as the input of a pre-trained keyword recognition model to obtain a keyword label corresponding to each word segmentation, and further analyzing the keyword information of the sentence text.

Description

Text keyword processing method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for processing text keywords.

Background

Keyword extraction is an important means for rapidly acquiring information topics, and has important application in the fields of information retrieval, natural language processing and the like. For example, in a specific service field, a service party may receive a large amount of problem information fed back by users every day, and in order to lock a hot spot problem and provide a solution as soon as possible, the service party needs to spend a large amount of time to extract a problem stated by a user from mass data.

Therefore, there is a need to provide a more reliable solution.

Disclosure of Invention

The embodiment of the specification provides a text keyword processing method, a text keyword processing device and text keyword processing equipment, so that keywords in a text can be efficiently and accurately extracted.

An embodiment of the present specification further provides a text keyword processing method, including:

performing dependency syntax analysis based on the sentence text after word segmentation and part-of-speech tagging to obtain the syntax dependency relationship of each word segmentation in the sentence text;

generating a participle sample corresponding to each participle in the sentence text based on the syntactic dependency relationship and the part of speech of each participle in the sentence text;

respectively taking the segmentation samples corresponding to the segmentation as input of a keyword recognition model to obtain keyword labels corresponding to the segmentation, wherein the keyword recognition model is obtained by training based on training segmentation samples corresponding to batch sentence texts and the keyword labels corresponding to the training segmentation samples, and the training segmentation samples have the characteristics with the same dimensionality as the segmentation samples;

and obtaining the keyword information of the sentence text based on each participle and the corresponding keyword label.

performing word segmentation and part-of-speech tagging on the batch sentence texts;

performing dependency syntax analysis on the sentence text after word segmentation and part-of-speech tagging to obtain the syntax dependency relationship of each word segmentation in each sentence text;

generating training participle samples corresponding to the participles in each sentence text based on the syntactic dependency relationship and the part of speech of the participles in each sentence text;

and taking training word segmentation samples corresponding to each segmentation word in each sentence text as the input of the keyword recognition model, taking the keyword labels of the segmentation words as the output of the keyword recognition model, and training the keyword recognition model.

An embodiment of the present specification further provides a text keyword processing apparatus, including:

the first processing module is used for carrying out dependency syntactic analysis on the sentence text after word segmentation and part-of-speech tagging to obtain the syntactic dependency relationship of each word segmentation in the sentence text;

the second processing module is used for generating a participle sample corresponding to each participle in the sentence text based on the syntactic dependency relationship and the part of speech of each participle in the sentence text;

the model identification module is used for respectively taking the participle samples corresponding to the participles as input of a keyword identification model so as to obtain keyword labels corresponding to the participles, the keyword identification model is obtained by training based on training participle samples corresponding to batch sentence texts and the keyword labels corresponding to the training participle samples, and the training participle samples have the characteristics with the same dimensionality as the participle samples;

and the third processing module is used for obtaining the keyword information of the sentence text based on each participle and the corresponding keyword label.

the first processing module is used for performing word segmentation and part-of-speech tagging on the batch sentence texts;

the second processing module is used for carrying out dependency syntax analysis on the sentence texts subjected to word segmentation and part-of-speech tagging processing to obtain the syntax dependency relationship of each word segmentation in each sentence text;

the third processing module is used for generating training participle samples corresponding to the participles in each sentence text based on the syntactic dependency relationship and the part of speech of the participles in each sentence text;

and the model training module is used for taking training participle samples corresponding to the participles in each sentence text as the input of the keyword recognition model, taking the keyword labels of the participles as the output of the keyword recognition model and training the keyword recognition model.

An embodiment of the present specification further provides an electronic device, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

obtaining keyword information of the sentence text based on each participle and the corresponding keyword label;

alternatively, the first and second electrodes may be,

Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the following operations:

alternatively, the first and second electrodes may be,

One embodiment of the description realizes that the part of speech of each participle in the sentence text and the syntactic dependency relationship among the participles are comprehensively analyzed and used as the input of a pre-trained keyword recognition model, so that the keyword information in the semantic text is extracted, and the extraction efficiency and accuracy of the keywords are effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

fig. 1a and 1b are schematic diagrams of application scenarios provided in the present specification;

fig. 2 is a schematic flowchart of a text keyword processing method according to an embodiment of the present disclosure;

FIG. 3 is a diagram of dependency parsing provided by an embodiment of the present description;

fig. 4 is a flowchart illustrating a text keyword processing method according to another embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a text keyword processing apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a text keyword processing apparatus according to another embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in this description belong to the protection scope of this document.

An application scenario of the present specification is exemplified below.

Referring to fig. 1a, the first application scenario includes: user equipment 101 and a service server 102, wherein:

the user equipment 101 is used for the user to feed back problem information to a service party, such as payment problems encountered in a certain service handling process;

the service server 102 is used for collecting a problem information set fed back by a user group and extracting keywords in each problem information; then, the extracted keywords are subjected to statistical analysis, so that one or more hot spot problems are obtained, and a solution of the hot spot problems is generated and provided for a user.

Referring to fig. 1b, the second application scenario includes: user equipment 101 ', server 102', and database 103, wherein:

the user equipment 101 'is used for a user to send an access request to the server 102', wherein the access request carries information indicating to access specified data;

a server 102' for processing the information indicating the access to the specified data to extract the keywords therein, generating a search formula, and retrieving the specified data corresponding to the keywords from the database 103 and providing the retrieved data to the user.

The user equipment refers to terminal equipment used by a user, and can be a PC (personal computer), a smart phone, a tablet computer and other mobile terminals; the server refers to the device owned by the business party or the retrieval service provider.

It should be understood that the above two application scenarios are only examples of application scenarios to which the present solution is applicable, and are not limited, and as for other application scenarios, the description of the present specification is not further expanded.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 2 is a schematic flowchart of a text keyword processing method provided in an embodiment of the present specification, where the method may be executed by the server in fig. 1a or fig. 1b, with reference to fig. 2, and the method may specifically include the following steps:

step 202, performing dependency syntax analysis based on the sentence text after word segmentation and part-of-speech tagging to obtain the syntax dependency relationship of each word segmentation in the sentence text;

the word segmentation and part-of-speech tagging in step 202 refer to a process of performing word segmentation processing on a sentence text and performing part-of-speech tagging on each segmented word obtained by the word segmentation processing. The process may specifically be exemplified by:

suppose the sentence text is' neither Madi know! If the target text is input into the integrated model of word segmentation and part of speech tagging, the integrated model is used for carrying out word segmentation and part of speech judgment processing, and the integrated model can not recognize the target text by the Madi! The word segmentation in' is segmented and part-of-speech tagging is performed on each word segmentation to obtain word segmentation results and part-of-speech characteristics, which are shown in table 1 below.

The integrated model can be exemplified by a word segmentation and part-of-speech tagging integrated model based on an LSTM neural network, a hand language processing model and the like; nrf refers to the transliterated name of the noun n under the genus, rr refers to the pronoun of the person under the pronoun r, d refers to the adverb, v refers to the verb, and w refers to the symbol.

It is understood that a plurality of sentences may exist in the text, and each sentence can be subjected to word segmentation and part-of-speech tagging in turn by the integrated model. For convenience of description and understanding, the text of a single sentence is used as an example for description.

In the dependency syntax analysis in step 202, the sentence is analyzed into a dependency syntax tree, and the dependency relationship among the words is described, that is, the syntactic collocation relationship among the words is pointed out, and the collocation relationship is associated with the semantics; the dependency relationships include: a subject, a guest, a preposition object, a concurrent language, a middle, a dynamic complement, a parallel, a guest, a left addition, a right addition, an independent, a core and the like; the dependency parsing model may be exemplified as a CRF-based chinese syntax dependency parsing model.

The dependency parsing can be implemented by a dependency parsing model, for example, the results of the segmentation and part-of-speech tagging shown in table 1 are input to the dependency parsing model, so as to obtain the dependency relationship between the segmentation output by the dependency parsing model.

The following illustrates dependency syntax analysis:

example 1, you don't know about for' Maddi! ' statement text

Dependency syntax analysis is performed on the text labeled by the participles and the parts of speech, and the syntax dependency relationship among the participles in the participle sentences is shown in the following table 2.

The core word sequence number refers to the sequence number of the core word of each participle, if the core word is self, the core word is marked as 0, and the core word refers to the participle which is in the dominant position in the dependency relationship, namely 'recognition'; for dependencies, two labeling rules are illustrated here:

labeling rule 1, labeling based on word segmentation part of speech

If 'know' is a verb and 'madi' is an object of 'know' this verb, then both may be labeled as a guest-moving relationship; if you are the subject and knowing is the predicate of you, then the two can be labeled as a predicate relationship.

By analogy, the relationships among the participles can be labeled.

Labeling rule 2, labeling based on event content corresponding to text

If you are the party who implements the event and Maidi is the event content, then Maidi can be labeled as content and you as an event; similarly, 'all' and 'not' may be labeled as degrees of performance.

Example 2 statement text for "meeting announced first grant of State List

Through the word segmentation and part-of-speech tagging processing in step 202, a word segmentation result is 'meeting', 'announcement', 'acquired', 'initial batch', 'asset depth', 'institution' and 'list', a dependency syntax tree (see fig. 3) is analyzed in combination with the part-of-speech of each word segmentation, and the dependency syntax tree can be referred to the two tagging methods disclosed above, so that the dependency syntax tree can be used to describe the dependency relationship between each word segmentation.

The embodiment of the present specification shows a specific implementation manner of the step 202. Of course, it should be understood that this step may be implemented in other ways, and the embodiment of this application is not limited thereto. Therefore, the dependency relationship among the participles is learned by using the intelligent model and the syntactic dependency participles are performed, so that the accuracy of the analyzed dependency relationship can be effectively improved, and support is provided for subsequent keyword extraction; in addition, the embodiments of the present specification further provide two dependency labeling rules, based on which the dependency labeling can be performed from two perspectives, namely, the part of speech of the participle and the event content, so that the accuracy of the analyzed dependency can be further improved.

Step 204, generating a participle sample corresponding to each participle in the sentence text based on the syntactic dependency relationship and the part of speech of each participle in the sentence text;

specifically, each participle and the part-of-speech and syntax sequential relationship corresponding to each participle may be used as features to obtain a participle sample corresponding to each participle, for example, the participle sample corresponding to the participle 'mdy' in table 2 at least includes 'mdy', 'nrf', and 'content'.

Step 206, respectively taking the segmentation samples corresponding to the segmentation as input of a keyword recognition model to obtain keyword labels corresponding to the segmentation, wherein the keyword recognition model is obtained by training based on training segmentation samples corresponding to batch sentence texts and keyword labels corresponding to the training segmentation samples, and the training segmentation samples and the segmentation samples have the same dimensional characteristics;

and step 208, obtaining the keyword information of the sentence text based on each participle and the corresponding keyword label.

The keyword label is used for representing keyword information of a participle corresponding to the participle of the sentence text or the training participle sample; the training word segmentation sample at least comprises word segmentation and part of speech and syntax sequential relation characteristics thereof.

Assuming that the keyword tags include key tags and non-key tags, where the key tags and the non-key tags are respectively used to represent the participles corresponding to the tags as the keywords and the non-keywords, one implementation manner of step 208 may be:

determining key participles corresponding to key labels in each participle by selecting the participles marked with the key labels; then, based on each key participle, obtaining the keyword information of the sentence text, and performing subsequent processing if the key participle in the sentence text is used as the keyword information.

Further, the embodiment further subdivides the key labels, so that the key labels include a plurality of key level labels, and the plurality of key level labels are used for representing a plurality of key levels of the participles, that is, in the training process, the key levels are labeled for the participles corresponding to the training participle samples, so that the model can learn, and therefore the model can output not only the key words but also the key levels of the key words.

Wherein, the key level labeling rule comprises:

the first key level corresponds to a second class of participles of a first class of participles or negative auxiliary words, and the first class of participles are used for expressing semantics of a sentence text, such as a subject, a predicate, an object and the like; the second category participles are used for modifying the first category participles, such as adverbs, adjectives, nouns and the like;

a second key level label corresponding to the second category of participles of non-negative helpwords;

and a third key level label corresponding to a third category of participles, wherein the third category of participles is used for modifying the second participles.

By analogy, the key level label corresponding to each category of participles can be obtained.

And the key levels of the participles represented by the first key level label, the second key level label and the third key level label are sequentially reduced.

In addition, in the sentence text in the compound sentence form, the nth key level label corresponding to the first target participle in the main sentence and the (n + 1) th key level label corresponding to the second target participle in the subordinate sentence which belong to the same compound sentence;

the first target participle and the second target participle belong to the same class of participles, n is greater than or equal to 1, and the key level of the participle represented by the nth key level label is greater than the key level of the participle represented by the (n + 1) th key level label.

Accordingly, another implementation of step 208 may be:

firstly, determining the key level of each key word segmentation based on the key level label corresponding to each key word segmentation; and then, obtaining the keyword information of the sentence text based on each key participle and the corresponding key level, namely using part or all of each key participle as the keyword information.

The selection rule for selecting part of the keywords from the key participles can be flexibly set, and is not limited here. For example, if the number of keywords confirmed by the model is more than a preset threshold, a part of keywords with higher key levels are selected from the keywords, and the selected part of keywords are used as keyword information of the semantic text.

The present embodiment shows two specific implementations of the step 208. Of course, it should be understood that the word-level labeling step may also be implemented in other ways, and the embodiment of the present application does not limit this. Therefore, the key level labels are introduced into the model training, so that the model has the capability of extracting the key words and the capability of configuring the key levels for the key words, the application range and the recognition capability of the model are improved, and more support can be provided for subsequent key word processing.

The following describes a specific application of the text keyword processing method corresponding to fig. 2, taking the application scenario shown in fig. 1a as an example:

assuming that the sentence text is a service problem text which is used for describing problems encountered by users when handling services, a service party can collect service problem texts fed back by user groups to obtain a service problem text set; then, each service problem text in the service problem set is processed in the steps 202 to 204 respectively to obtain a segmentation sample corresponding to each segmentation word in the service problem text, then, the segmentation sample corresponding to each segmentation word is input to a keyword identification model to obtain a keyword tag corresponding to each segmentation word, and then, the keywords corresponding to the service problem text set are counted and segmented, so that a service topic is generated, and a service party is helped to quickly lock the service problem fed back by a user.

In summary, the embodiments of the present disclosure extract the keyword information in the semantic text by comprehensively analyzing the part of speech of each participle in the sentence text and the syntactic dependency relationship between each participle and using the part of speech as the input of the pre-trained keyword recognition model, thereby effectively improving the extraction efficiency and accuracy of the keyword.

In another possible embodiment, based on the corresponding embodiment in fig. 2, the training participle sample further includes: and the position of the participle in the sentence text is recorded as a lexeme characteristic.

Specifically, in the process of performing word segmentation processing on a sentence text for training, the output sequence of each word segmentation is recorded; and marking the position of each participle in the sentence text for training based on the output sequence of each participle, thereby obtaining the lexeme characteristics of each participle. If the lexeme of the word corresponds to the word order of the word in the sentence text, the details are shown in table 3 below.

Further, the training participle sample further includes a word distance feature of the participle, the word distance feature is used for representing a distance between a position of each participle in the same sentence text and a position of a core word, the core word and the dominant word are respectively a dominant participle and a dominant participle in a syntactic dependency relationship of each participle, and the word distance feature of the core word in the training participle sample is a specific value.

Specifically, firstly, determining a core word and a plurality of dominant words in a training participle sample based on the dependency relationship among participles in the training participle sample; then, based on the positions of the core word and the multiple dominant words in the training participles, determining the distance between each dominant word and the core word, and setting the word distance corresponding to the core word as a specific value. Let' Madi do not know! ' as an example, specifically, the following may be exemplified:

based on the dependency relationship between the participles shown in table 2, it can be determined that 'recognition' is a dominant participle and is denoted as a core word, and other participles are 'recognition' assignable participles and are denoted as dominant words; then, combining the location of ' know ' and other participles in the text sample, the distance between each participle and ' know ' can be obtained, such as ' not ' and '! ' are both located adjacent to ' know ' and are 3 and 5 in combination, respectively, to give a word distance of-1 for ' not ' and ' know '! The word distance of ' and ' recognition ' is 1 and the word distance of the core word itself is labeled as a specific value, such as 0.

Based on this, we can obtain training participles samples after participles and part-of-speech tags, and word positions and word distance tags, as shown in table 3 below:

word level	Word segmentation position	Word segmentation	Part of speech	Distance from core word	Dependency relationship
						1	0	Madi	nrf	-4	Content providing method and apparatus
0	1	You are	rr	-3	Exercise and disposition
						0	2	Are all	d	-2	Degree of
1	3	Is not limited to	d	-1	Degree of
						1	4	Recognition of	v	0	Core component
0	5	！	w	1	Sign dependency

TABLE 3

The embodiment of the present specification shows a specific implementation manner of the aforementioned lexeme labeling step and range labeling step. Of course, it should be understood that the lexeme labeling step and the lexeme labeling step may also be implemented in other ways, which is not limited in the embodiment of the present application. Therefore, the word position characteristics and the word distance characteristics of the participles are introduced into the model training, so that the model can better learn the overall information of the context structure among the participles in the sentence, and the aim of improving the keyword extraction precision is fulfilled.

Fig. 4 is a schematic flowchart of a text keyword processing method provided in an embodiment of this specification, and referring to fig. 4, the method may specifically include the following steps:

step 402, performing word segmentation and part-of-speech tagging on the batch sentence texts;

specifically, each sentence text is respectively input into the word segmentation and part-of-speech tagging integrated model, and the word segmentation and part-of-speech tagging in each sentence text are obtained.

Step 404, performing dependency syntax analysis on the sentence text after the word segmentation and part-of-speech tagging processing to obtain a syntax dependency relationship of each word segmentation in each sentence text;

specifically, the sentence text result after each word segmentation and part-of-speech tagging is input into the dependency relationship analysis model, so as to obtain the dependency relationship between the words in each sentence text, as shown in table 2 above.

Step 406, generating training participle samples corresponding to the participles in each sentence text based on the syntactic dependency relationship and the part of speech of the participles in each sentence text;

specifically, in the process of executing step 402, the position of each participle in the sentence text is determined, and after the core word in the sentence text is determined according to the syntactic dependency relationship, the distance between each participle and the core word is further calculated to obtain the lexeme feature and the word distance feature corresponding to each participle, so as to obtain a participle sample at least including the participle and the part-of-speech tagging thereof, the syntactic sequential relationship, the lexeme feature, and the word distance feature.

And 408, taking the training word segmentation sample corresponding to each segmentation word in each sentence text as the input of the keyword recognition model, taking the keyword label of the segmentation word as the output of the keyword recognition model, and training the keyword recognition model so as to identify the keyword information of the sentence text to be recognized by using the trained keyword recognition model.

It should be noted that, since the specific implementation manners of the steps 402 to 408 are similar to the specific implementation manners of the steps 202 to 208 in the embodiment corresponding to fig. 2, and have been described in detail in the embodiment corresponding to fig. 2, the description of the similarity is omitted here.

The labeling rules of the key level labels are explained in detail as follows:

the subject, predicate, object and the like for representing text semantics, and the central words in the components are used as primary keywords;

using adverbs, adjectives, nouns and the like for modifying the primary keywords as secondary keywords;

if the secondary keyword is a negative auxiliary word, upgrading the secondary keyword into a primary keyword;

the third-level keywords are words for modifying the second-level keywords;

pronouns, prepositions, sighs and the like are not taken as keywords;

the query words are not labeled as keywords.

The labeling mode is as follows:

0: non-keywords

1: level one (i.e., first key level label)

2: level two (i.e. second key level label)

3: three-level (namely third key level label)

The embodiment of the present specification shows a specific implementation manner of the model training step. Of course, it should be understood that the classification model training step may also be implemented in other ways, and the embodiment of the present application does not limit this. On the basis, on one hand, the embodiment of the specification introduces the segmentation, the part of speech and the syntactic dependency relationship to train the model, so that the model learns the influence of the syntactic dependency relationship and sentence semantics on the keyword extraction, and the keyword extraction precision of the model is improved; on the other hand, the characteristics of word position, word distance and the like are simultaneously entered for model training, and the whole information of the context structure of the sentence is better combined, so that the keyword extraction capability of the model is enhanced; in another aspect, a key level label is introduced for model training, so that the model can not only find out the keywords in the sentence text, but also give out the key level of the keywords, and provide data support for subsequent processing.

Fig. 5 is a schematic structural diagram of a text keyword processing apparatus according to an embodiment of the present disclosure, and referring to fig. 5, the apparatus may specifically include: a first processing module 501, a model identification module 503 and a third processing module 504, wherein:

the first processing module 501 performs dependency syntax analysis based on the sentence text after the word segmentation and part-of-speech tagging to obtain a syntax dependency relationship of each word segmentation in the sentence text;

the second processing module 502 generates a participle sample corresponding to each participle in the sentence text based on the syntactic dependency and the part of speech of each participle in the sentence text;

the model identification module 503 is configured to use the segmentation samples corresponding to the respective segmentation as input of a keyword identification model to obtain keyword labels corresponding to the respective segmentation, where the keyword identification model is obtained by training based on training segmentation samples corresponding to a batch of sentence texts and keyword labels corresponding to the training segmentation samples, and the training segmentation samples have features with the same dimension as the segmentation samples;

the third processing module 504 obtains the keyword information of the sentence text based on each participle and the corresponding keyword tag.

Optionally, the keyword tags include key tags and non-key tags, and the key tags and the non-key tags are respectively used for representing word segments corresponding to the tags as keywords and non-keywords;

the third processing module 504 determines key participles corresponding to key labels in the participles; and obtaining the keyword information of the sentence text based on each key word segmentation.

Optionally, the key labels include a plurality of key level labels, and the plurality of key level labels are used for characterizing a plurality of key levels of the participle;

the third processing module 504 determines the key level of each key participle based on the key level label corresponding to each key participle; and obtaining the keyword information of the sentence text based on the key participles and the corresponding key levels.

Optionally, the key level labels corresponding to the training word segmentation samples include:

the first key level label corresponds to a second class of participles of a first class of participles or negative auxiliary words, the first class of participles are used for expressing the semantics of a sentence text, and the second class of participles are used for modifying the first class of participles;

a third key level label corresponding to a third category of participles, wherein the third category of participles is used for modifying the second participles;

The key level label corresponding to the training word segmentation sample further comprises:

the n-th key level label corresponding to the first target word segmentation in the main sentence and the n + 1-th key level label corresponding to the second target word segmentation in the subordinate sentences belong to the same compound sentence;

Optionally, the training participle sample further includes: the position of the participle in the sentence text.

Optionally, the training participle sample further includes: the word distance characteristics of the participles are used for representing the distance between the position of each participle in the same sentence text and the position of a core word, and the core word is the dominant participle in the syntactic dependency relationship of each participle.

Therefore, the part-of-speech of each participle in the sentence text and the syntactic dependency relationship between the participles are comprehensively analyzed and used as the input of the pre-trained keyword recognition model, so that the keyword information in the semantic text is extracted, and the extraction efficiency and accuracy of the keyword are effectively improved.

Fig. 6 is a schematic structural diagram of a text keyword processing apparatus according to another embodiment of the present disclosure, and referring to fig. 6, the apparatus may specifically include: a first processing module 601, a second processing module 602, a third processing module 603, and a model training module 604, wherein:

the first processing module 601 is used for performing word segmentation and part-of-speech tagging on a batch of sentence texts;

the second processing module 602 performs dependency syntax analysis based on the sentence text after the word segmentation and part-of-speech tagging processing to obtain a syntax dependency relationship of each word segmentation in each sentence text;

the third processing module 603 generates training participle samples corresponding to the participles in each sentence text based on the syntactic dependency and the part of speech of the participles in each sentence text;

specifically, the third processing module 603 may determine a position of each participle in the sentence text during the process of participle and part-of-speech tagging, and further calculate a distance between each participle and a core word after determining the core word in the sentence text according to the syntactic dependency relationship, to obtain a lexeme feature and a word distance feature corresponding to each participle, thereby obtaining a participle sample at least including the participle and part-of-speech tagging thereof, the syntactic sequential relationship, the lexeme feature, and the word distance feature.

The model training module 604 takes training participle samples corresponding to each participle in each sentence text as input of the keyword recognition model, takes keyword labels of the participles as output of the keyword recognition model, and trains the keyword recognition model.

As can be seen, on one hand, the embodiments of the present description introduce the word segmentation, the part of speech thereof, and the syntactic dependency relationship to train the model, so that the model learns the influence of the syntactic dependency relationship and the sentence semantics on the keyword extraction, and the keyword extraction accuracy of the model is improved; on the other hand, the characteristics of word position, word distance and the like are simultaneously entered for model training, and the whole information of the context structure of the sentence is better combined, so that the keyword extraction capability of the model is enhanced; in another aspect, a key level label is introduced for model training, so that the model can not only find out the keywords in the sentence text, but also give out the key level of the keywords, and provide data support for subsequent processing.

Moreover, as for the above device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment. It should be noted that, in the respective components of the apparatus of the present specification, the components therein are logically divided according to the functions to be implemented thereof, but the present specification is not limited thereto, and the respective components may be newly divided or combined as necessary.

Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, and referring to fig. 7, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may also include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form the text keyword processing device on the logic level. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

The network interface, the processor and the memory may be interconnected by a bus system. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

The memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The Memory may include a Random-Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory.

The processor is used for executing the program stored in the memory and specifically executing:

alternatively, the first and second electrodes may be,

The method performed by the text keyword processing apparatus or the manager (Master) node according to the embodiments disclosed in fig. 5 to 6 of the present specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The text keyword processing apparatus may also perform the methods of fig. 2-4 and implement the methods performed by the administrator node.

Based on the same invention creation, the present specification also provides a computer readable storage medium, which stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to execute the text keyword processing method provided by the corresponding embodiment of fig. 2 to 4.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A text keyword processing method comprises the following steps:

2. The method of claim 1, wherein the keyword tags comprise key tags and non-key tags, and the key tags and the non-key tags are respectively used for representing word segments corresponding to the tags as keywords and non-keywords;

obtaining keyword information of the sentence text based on each participle and the corresponding keyword tag, wherein the obtaining of the keyword information of the sentence text comprises the following steps:

determining key participles corresponding to the key labels in the participles;

and obtaining the keyword information of the sentence text based on each key word segmentation.

3. The method of claim 2, the key labels comprising a plurality of key level labels for characterizing a plurality of key levels of a participle;

obtaining keyword information of the sentence text based on the key participles, wherein the obtaining of the keyword information of the sentence text based on the key participles comprises:

determining the key level of each key word segmentation based on the key level label corresponding to each key word segmentation;

and obtaining the keyword information of the sentence text based on the key participles and the corresponding key levels.

4. The method of claim 3, wherein the training of the key level labels corresponding to the participle samples comprises:

5. The method of claim 4, wherein the training of the key level labels corresponding to the participle samples further comprises:

6. The method of claim 1, the training participle sample further comprising: the position of the participle in the sentence text.

7. The method of claim 6, the training participle sample further comprising: the word distance characteristics of the participles are used for representing the distance between the position of each participle in the same sentence text and the position of a core word, and the core word is the dominant participle in the syntactic dependency relationship of each participle.

8. A text keyword processing method comprises the following steps:

9. A text keyword processing apparatus comprising:

10. A text keyword processing apparatus comprising:

11. An electronic device, comprising:

a processor; and

alternatively, the first and second electrodes may be,

12. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the operations of:

alternatively, the first and second electrodes may be,