CN112580349A

CN112580349A - Phrase extraction method and device and electronic equipment

Info

Publication number: CN112580349A
Application number: CN202011558253.2A
Authority: CN
Inventors: 李雪婷; 简仁贤; 吴文杰; 刘影
Original assignee: Emotibot Technologies Ltd
Current assignee: Emotibot Technologies Ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-03-30
Anticipated expiration: 2040-12-24
Also published as: CN112580349B

Abstract

The application provides a phrase extraction method and device and electronic equipment, wherein the method comprises the following steps: obtaining a sentence to be processed; the method comprises the steps that word segmentation, part-of-speech tagging and dependency syntax processing are sequentially carried out on a sentence to be processed, and dependency relationship labels among different words and part-of-speech labels of each word are generated; judging whether the core relation word with the core relation label is a verb or not according to the dependency relation labels among different words and the part-of-speech label of each word; if the core relation word is a verb, searching a target word which forms an appointed dependency relationship with the core relation word; and determining whether to perform merging output of the core relation words and the target words or not according to the label information of the target words. The scheme can automatically extract phrases by a computer according to a certain rule based on the part of speech and the dependency relationship of the participles, thereby improving the phrase extraction efficiency and the accuracy.

Description

Phrase extraction method and device and electronic equipment

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a phrase extraction method and apparatus, and an electronic device.

Background

The recognition and analysis of basic phrases is one of the important tasks of natural language shallow syntactic analysis. The analysis result of the basic phrase can simplify the structure of the sentence and reduce the complexity of syntactic analysis. And as a partial analysis result with high certainty, the basic phrase analysis can solve most of local ambiguous structure problems, thereby laying a foundation for deeper language block analysis and complete syntactic analysis. For example, in the field of existing natural language processing technology, chinese phrase extraction is of great help for coarse-grained word segmentation, keyword extraction, information extraction, and the like.

Therefore, the existing Chinese phrase extraction method mainly starts from a training corpus, and the method consumes manpower and also faces the problem that the accuracy rate reaches a critical point and is difficult to improve.

Disclosure of Invention

The embodiment of the application provides a phrase extraction method, which is used for reducing labor cost and improving extraction efficiency.

The embodiment of the application provides a phrase extraction method, which comprises the following steps:

obtaining a sentence to be processed;

performing word segmentation, part-of-speech tagging and dependency syntax processing on the sentence to be processed in sequence to generate dependency relationship labels among different words and part-of-speech labels of each word;

judging whether the core relation word with the core relation label is a verb or not according to the dependency relation labels among different words and the part-of-speech label of each word;

if the core relation word is a verb, searching a target word which forms an appointed dependency relationship with the core relation word;

and determining whether to perform merging output of the core relation words and the target words or not according to the label information of the target words.

In one embodiment, the searching for the target word constituting the specified dependency relationship with the core relation word includes:

searching a target word having a structure relationship in a shape with the core relation word according to the dependency relationship labels among different words;

the determining whether to perform the merging output of the core relation words and the target words according to the tag information of the target words includes:

and if the part of speech of the target word is an adverb and is adjacent to the core relation word, merging and outputting the core relation word and the target word.

In an embodiment, if the part of speech of the target word is an adverb and is adjacent to the core related word, merging and outputting the core related word and the target word, including:

if the part of speech of the target word is an adverb and is adjacent to the core relation word, judging whether the target word has a child node;

if the target word has child nodes, merging and outputting the core relation word, the target word and the vocabularies corresponding to the child nodes.

In an embodiment, the determining whether to perform merging output of the core relation word and the target word according to the tag information of the target word includes:

if the part of speech of the target word is an adverb and is not adjacent to the core related word, judging whether the target word has a child node;

and if the target word has child nodes, merging and outputting the target word and the vocabularies corresponding to the child nodes.

and if the part of speech of the target word is preposition, judging the part of speech of an object forming a preposition relation with the target word, if the part of speech of the target word is preposition, taking the object as a core word, and determining whether to carry out merging output of the core word and the vocabulary according to the part of speech of the vocabulary forming the specified dependency relation with the core word.

searching a target word having a dynamic complement structure relationship with the core relation word according to the dependency relationship labels among different words;

and if the part of speech of the target word is an adjective, combining and outputting the core relation word and the target word.

In an embodiment, if the part of speech of the target word is an adjective, merging and outputting the core relation word and the target word, including:

if the part of speech of the target word is an adjective, judging whether the target word has a child node;

and if the part of speech of the target word is a verb, taking the target word as a core word, and determining whether to carry out merging output of the core word and the vocabulary according to the part of speech of the vocabulary which forms the specified dependency relationship with the core word.

searching for target words with a main-meaning relationship, a moving-guest relationship or a preposed object relationship with the core relation words;

if the part of speech of the target word is noun, pronoun and number word, judging whether the target word has child nodes;

searching for target words having an inter-guest relationship or a bilingual relationship with the core relation words;

judging whether the target word has child nodes or not according to the label information of the target word;

In an embodiment, the method further comprises: and outputting the corresponding part of speech of the phrase at the same time of outputting the phrase.

An embodiment of the present application further provides a phrase extraction apparatus, where the apparatus includes:

the sentence acquisition module is used for acquiring sentences to be processed;

the tag generation module is used for sequentially carrying out word segmentation, part of speech tagging and dependency syntax processing on the sentence to be processed to generate dependency relationship tags among different words and part of speech tags of each word;

the part-of-speech judging module is used for judging whether the core relation word with the core relation label is a verb or not according to the dependency relation labels among different words and the part-of-speech label of each word;

the target word searching module is used for searching a target word which forms an appointed dependency relationship with the core relation word when the core relation word is a verb;

and the merging judgment module is used for determining whether to merge and output the core relation words and the target words according to the label information of the target words.

An embodiment of the present application provides an electronic device, which includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the above phrase extraction method.

Embodiments of the present application provide a computer-readable storage medium, which stores a computer program, where the computer program is executable by a processor to perform the above-mentioned phrase extraction method.

According to the technical scheme provided by the embodiment of the application, the dependency relationship labels among different words and the part-of-speech label of each word are generated by performing word segmentation, part-of-speech tagging and dependency syntax processing on the sentence to be processed, and then whether the core relationship word with the core relationship label is a verb is judged; when the core relation word is a verb, searching a target word which forms an appointed dependency relationship with the core relation word; whether the core related words and the target words are combined and output is determined according to the label information of the target words, so that phrase extraction does not need to be carried out manually according to the sense of language, and phrases are automatically extracted by a computer according to a certain rule based on the part of speech and the dependency relationship of the participles, so that the phrase extraction efficiency and the accuracy are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a phrase extraction method according to an embodiment of the present application;

FIG. 3 is a detailed flowchart of a phrase extraction method according to an embodiment of the present application;

fig. 4 is a block diagram of a phrase extraction apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be configured to perform the phrase extraction method provided in the embodiments of the present application. As shown in fig. 1, the electronic device 100 includes: one or more processors 102, and one or more memories 104 storing processor-executable instructions. Wherein the processor 102 is configured to execute the phrase extraction method provided in the following embodiments of the present application.

The processor 102 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in the electronic device 100, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 102 to implement the phrase extraction method described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, the electronic device 100 shown in FIG. 1 may also include an input device 106, an output device 108, and a data acquisition device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device 100 may have other components and structures as desired.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. The data acquisition device 110 may acquire an image of a subject and store the acquired image in the memory 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.

In one embodiment, the components of the example electronic device 100 for implementing the phrase extraction method of the embodiments of the present application may be integrated or distributed, such as integrating the processor 102, the memory 104, the input device 106 and the output device 108, and separately arranging the data acquisition device 110.

In an embodiment, the example electronic device 100 for implementing the phrase extraction method of the embodiments of the present application may be implemented as a smart terminal such as a smartphone, a tablet computer, a desktop computer, and the like.

Fig. 2 is a schematic flowchart of a phrase extraction method provided in an embodiment of the present application. The method may be performed by an electronic device such as a computer. As shown in fig. 2, the method includes the following steps S210 to S250.

Step S210: and acquiring a sentence to be processed.

The phrase is a unit of language without sentence tone combined by three language units which can be matched on three levels of syntax, semantics and language, and is also called a phrase. It is a unit of grammar that is larger than words but not sentences. The scheme provided by the embodiment of the application can extract phrases from the sentences to be processed.

For example, the sentence to be processed may be "the subsidiary ministry of logistics department" call a country to stay in the state and lead a lie to be somebody 25 days later, and make a strict negotiation and a strong countermeasure with respect to a certain organization through a so-called "economy and trade act in a certain area", so as to urge a certain party to immediately correct errors ". The phrase may be "someone in the subsidiary ministry of logistics", "someone in the country who lives and makes you" and so on.

Step S220: and performing word segmentation, part-of-speech tagging and dependency syntax processing on the sentence to be processed in sequence to generate dependency relationship labels among different words and part-of-speech labels of each word.

In an embodiment, an ICTCLAS tokenizer may be used to perform a tokenization operation on a sentence to be processed to obtain words. And then, performing part-of-speech tagging on the words subjected to the word segmentation operation by using an ICTCCLAS word segmentation device, namely, marking a corresponding part-of-speech tag on each word, wherein the noun is marked as n, the time word is marked as nt, the number word is marked as m, the quantifier is marked as q, the pronouns are marked as r, the verbs are marked as v, the adjective is marked as a, the adverb is marked as d, the prepositions are marked as p, the names of people are marked as nr, the names of places are marked as ns, the names of the group organs are marked as ni, the punctua.

For example, the above-listed sentences to be processed can be segmented and part-of-speech tagged to obtain that "the back office _ ni vice johnson _ nh _ n has a certain _ nh25 day _ nt call — v A country _ ns resident _ n hua _ ns causes _ n to lie a certain _ nh, _ w sets forth _ v strict _ a negotiation _ n and _ cc strong _ a countermeasure _ n via _ v so-called _ a" _ w certain region _ ns economy _ n and _ c trade _ n law _ n "_ w, and _ w urges a certain party _ n to immediately correct _ d _ v error _ n. And _w'.

After the part-of-speech tag of each word is obtained, the dependency relationship between different words can be analyzed through an existing dependency syntax processing tool (such as LTP), and corresponding tags are marked for the difference based on the dependency relationship. As shown in Table 1 below, the dependencies between words may have a predicate relationship, a move-guest relationship, an inter-guest relationship, and so on.

TABLE 1 Label definitions of dependencies

Type of relationship	Label (R)
		Relationship between major and minor	SBV
Moving guest relationship	VOB
		Inter-guest relationships	IOB
Preposition object	FOB
		Concurrent language	DBL
Centering relationships	ATT
		Middle structure	ADV
Dynamic compensation structure	CMP
		In a parallel relationship	COO
Intermediary relation	POB
		Left additive relationship	LAD
Right additive relationship	RAD
		Punctuation	WP
Core relationships	HED

For the above listed sentences to be processed, the part-of-speech tags and dependency tags as shown in table 2 below can be obtained.

TABLE 2 part of speech and dependency tags for each word in the sentence to be processed

Serial number	Word and phrase	Part of speech	Father node	Dependency relationship
					1	Department of logistics	ni	2	ATT
2	Length of subsidiary part	n	3	ATT
					3	King a certain	nh	5	SBV
4	For 25 days	nt	5	ADV
					5	Summons	v	0	HED
6	In a country	ns	9	ATT
					7	Standing still	n	9	ATT
8	Hua Qi Wan	ns	7	VOB
					9	Dawang tea	n	10	ATT
10	Something in plum	nh	5	VOB
					11	，	wp	5	WP
12	Then is turned on	p	23	ADV
					13	A certain tissue	ni	14	SBV
14	By passing	v	12	POB
					15	So-called	a	21	ATT
16	“	wp	21	WP
					17	In a certain area	ns	21	ATT
18	Economy of production	n	21	ATT
					19	And	c	20	LAD
20	trading	n	18	COO
					21	Law of law	n	14	VOB
22	”	wp	21	WP
					23	Put forward	v	5	COO
24	Yan Zheng	a	25	ATT
					25	Engagement	n	23	VOB
26	And	c	28	LAD
					27	is strong and strong	a	28	ATT
28	Anti-protocol	n	25	COO
					29	，	wp	23	WP
30	Promote the growth of	v	23	COO
					31	A certain party	n	30	DBL
32	Immediate use	d	34	ADV
					33	Correction of	v	30	VOB
34	Error(s) in	n	33	VOB
					35	。	wp	30	WP

As shown in table 2 above, the "logistics section" is a mechanism related noun, the "length of the subsidiary section" is a noun, and a centering relationship is established between the "logistics section" and the "length of the subsidiary section". The core relationship is "summons".

Step S230: and judging whether the core relation word with the core relation label is a verb or not according to the dependency relation labels among different words and the part-of-speech label of each word.

Taking table 2 above as an example, the core relation word (also called HED node) with the core relation label is "summons", and the core relation word can be regarded as the core of the sentence to be processed. According to the part-of-speech tag of the 'summons', whether the 'summons' is a verb can be judged. If the HED node is not a verb, the pending sentence may be placed into the invalid sentence set. The embodiment of the application mainly aims at the sentence with the verb as the sentence core.

Step S240: and if the core relation word is a verb, searching a target word which forms a specified dependency relationship with the core relation word.

In one embodiment, the target words forming the specified dependency relationship with the core relation words may be words having a SBV/VOB/IOB/FOB/DBL/ADV/CMP/COO/POB relationship with the core relation words, which may be regarded as primary child nodes of the first level of the HED node, and may be referred to as SBV node, VOB node, IOB node, FOB node, DBL node, ADV node, CMP node, COO node, POB node in turn. For the purpose of distinction, the searched vocabulary having any one of the above relations with the core relation words is called a target word.

Taking table 2 above as an example, the "summons" of serial number 5 has an SBV relationship with the "wangzhi" of serial number 3, the "summons" of serial number 5 has an ADV relationship with the "25 th day" of serial number 4, the "summons" of serial number 5 has a VOB relationship with the "lie" of serial number 10, and the "summons" of serial number 5 has a COO relationship with the "proposed" of serial number 23. The target words may have "wang somebody" (i.e., SBV node), "25 days" (i.e., ADV node), "li somebody" (i.e., VOB node), and "go up" (i.e., COO node).

Step S250: and determining whether to perform merging output of the core relation words and the target words or not according to the label information of the target words.

The tag information of the target word may include a part-of-speech tag of the target word and dependency tags with other words. Merged output refers to output that is combined together as a phrase.

In one embodiment, the computer may find a target word having an in-shape structure relationship (ADV) with the core relation word according to the dependency relationship tags between different words; and according to the label information of the target word, if the part of speech of the target word is an adverb and is adjacent to the core relation word, merging and outputting the core relation word and the target word. In an embodiment, whether the target word has a child node can be further determined; if the target word has child nodes, merging and outputting the core relation word, the target word and the vocabularies corresponding to the child nodes.

The first vocabulary is used to modify the second vocabulary, and the first vocabulary may be considered a child of the second vocabulary. Taking the above listed sentences to be processed as an example, as can be seen from table 2, the "summons" are core relationship nodes, and the child nodes thereof have "wang somebody", "25 days", "li somebody" and "propose". And the child node of 'Wangzou' is 'vice president' and the child node of 'vice president' is 'logistics department'. So the "logistics department" can also be considered as a child node of "somebody in the king".

In one embodiment, the computer may find a target word that has an ADV relationship with the core relationship word and merge the output if the part of speech of the target word is an adverb and is next to the core relationship word. If the target word has child nodes, the core relation word, the target word and all child nodes of the target word can be merged and output. Conversely, if the target word has no children, only the core relationship word and the target word may be merged. On the contrary, if the part of speech of the target word is an adverb but is not next to the core relation word, it is not merged for output.

In an embodiment, the computer may search for a target word having an ADV relationship with a core related word, determine a part-of-speech of an object forming a preposition relationship with the target word if the part-of-speech of the target word is a preposition, and determine whether to perform output combining the core word and the vocabulary according to the part-of-speech of the vocabulary forming a specified dependency relationship with the core word by using the object as the core word if the object word is a verb.

In the current embodiment, the core word may be regarded as an object of the target word, and the object is a verb. At this time, the object may be used as a sentence core, similar to the above core relation words, and the words forming the specified dependency relationship with the core word may be searched by the same method, and may include words having SBV/VOB/IOB/FOB/DBL/ADV/CMP/COO/POB relationships with the core word, and this time may be regarded as child nodes of the core word, including SBV node words, VOB node words, IOB node words, FOB node words, DBL node words, ADV node words, CMP node words, COO node words, and POB node words.

In addition to the structural relationship in the form, the computer can search a target word having a dynamic complement structural relationship with the core related word according to the dependency relationship labels among different words; and according to the label information (including part-of-speech labels) of the target words, if the part-of-speech of the target words is an adjective, combining and outputting the core related words and the target words. If the part of speech of the target word is an adjective, further judging whether the target word has a child node; if the target word has child nodes, merging and outputting the core relation word, the target word and the vocabularies corresponding to the child nodes. On the contrary, if there is no child node, only the core relation word and the target word may be merged and output.

In other embodiments, the computer searches for a target word having a complementary structure relationship with the core related word, and if the part of speech of the target word is a verb, the target word can be used as the core word, and determines whether to perform merging output of the core word and the vocabulary according to the part of speech of the vocabulary forming the specified dependency relationship with the core word.

In one embodiment, in addition to the above structural relationships in the form, complementary structural relationships, the computer may also find target words that have a predicate relationship, a verb-guest relationship, or a pre-object relationship with the core relationship word. And if the part of speech of the target word is a verb, taking the target word as a core word, and determining whether to carry out merging output of the core word and the vocabulary according to the part of speech of the vocabulary which forms the specified dependency relationship with the core word.

If the part of speech of the target word is noun, pronouns and quantitative words, the core relation word is not combined with the target word. Continuously judging whether the target word has child nodes or not; and if the target word has child nodes, merging and outputting the target word and the vocabularies corresponding to the child nodes.

In one embodiment, the computer may further search for a target word having an interguest relationship or a bilingual relationship with the core relationship word; judging whether the target word has child nodes or not according to the label information of the target word; and if the target word has child nodes, merging and outputting the target word and the vocabularies corresponding to the child nodes.

Fig. 3 is a detailed flowchart of a phrase extraction method provided in an embodiment of the present application. As shown in fig. 3, the method comprises the steps of:

in step S301, a sentence to be processed is input.

And step S302, performing word segmentation, part of speech tagging and dependency syntax processing.

And S303, judging whether the vocabulary of the HED node (sentence core) is in a verb class, if so, entering the following steps, and if not, putting an invalid sentence subset.

Step S304, finding the primary main child node of the HED node, namely the SBV node, and if the primary main child node of the HED node exists, judging the part of speech of the vocabulary of the SBV node (the vocabulary of the node is directly represented by the node for convenience of description); if not, proceed to next step 305;

if the SBV node is a noun, pronoun, or number of words, the SBV node is not merged with the HED node. If the SBV node has a child node, merging the SBV node and all child node vocabularies under the SBV node, keeping the part of speech of the SBV node, and outputting a result; if the SBV node has no child node, the part of speech of the SBV node is kept, and a result is output;

if the SBV node is a preposition class, the SBV node is not merged with the HED node. And outputting the preposition and the part of speech thereof. And judging the part of speech of the preposed child node POB node. If the word class is the verb class, the POB node is used as the core of the sentence, and the sentence is re-executed from step S303 to step S314. If the part of speech is other part of speech, combining the POB nodes and all the sub-nodes of the POB nodes, keeping the part of speech of the POB nodes, outputting the result, and if the POB nodes do not have the sub-nodes, outputting the POB nodes and the part of speech thereof;

if the SBV node is verb class, the SBV is not merged with the HED; the SBV node is used as a sentence core, and the sentence in which the SBV node is located re-executes the step S303 to the step S314;

if the SBV node is of another part of speech, the SBV is not merged with the HED. If the child node exists, combining the SBV node and all child nodes under the node, keeping the part of speech of the SBV node, and outputting a result; if no child node exists, the part of speech of the SBV node is kept, and the result is output.

Step S305, finding VOB nodes, if yes, judging the part of speech of the VOB nodes; if not, go to the next step S306;

if VOB node is noun, pronoun, quantity part of speech, VOB node is not merged with HED. If the VOB node has the child node, combining the VOB node and all child nodes under the VOB node, keeping the part of speech of the VOB node, and outputting a result; if VOB has no sub-node, keeping the part of speech of VOB node, and outputting the result;

if the VOB node is a preposition class, the VOB node is not merged with the HED node. And outputting the preposition and the part of speech thereof. And judging the part of speech of the preposed child node POB node. If the word class is the verb class, the POB node is used as the core of the sentence, and the sentence is re-executed from step S303 to step S314. If the part of speech is other part of speech, combining the POB node and all the sub-nodes of the POB, keeping the part of speech of the POB, outputting the result, and outputting the POB node and the part of speech thereof under the condition that the POB has no sub-nodes;

if the VOB node is a verb class, the VOB node is not merged with the HED node. VOB node is used as sentence core, and the sentence is executed again in step S303 and step S314;

other parts of speech, VOB nodes, are not merged with HED nodes. If the sub-node exists, combining the VOB node and all the sub-nodes under the node, keeping the part of speech of the VOB node, and outputting a result; if no child node exists, the part of speech of the VOB node is kept, and the result is output.

Step S306, finding the IOB node, if yes, the IOB node is not merged with the HED node; if not, go to the next step S307;

if the IOB node has child nodes, combining the IOB node and all the child nodes of the IOB, keeping the part of speech of the IOB and outputting a result;

if no child node exists, the part of speech of the IOB node is kept, and the result is output.

S307, finding the FOB node, and if the FOB node exists, judging the part of speech of the FOB node; if not, go to the next step S308;

if the FOB node is noun, pronoun, or quantity/word class, the FOB node is not merged with the HED node. If the FOB node has a child node, combining the FOB node and all child nodes under the FOB node, keeping the part of speech of the FOB node, and outputting a result; and if the FOB does not have the child node, the part of speech of the FOB node is kept, and the result is output.

If the FOB node is a preposition class, the FOB node is not merged with the HED node. And outputting the preposition and the part of speech thereof. And judging the part of speech of the preposed child node POB node. If the word class is the verb class, the POB node is used as the core of the sentence, and the sentence is re-executed from step S303 to step S314. If the part of speech is other part of speech, combining the POB node and all the sub-nodes of the POB, keeping the part of speech of the POB, outputting the result, and outputting the POB node and the part of speech thereof under the condition that the POB has no sub-nodes;

if the FOB node is of verb class, the FOB node is not merged with the HED node. The FOB node is used as a sentence core, and the sentence in which the FOB node is located re-executes the step S303 to the step S314;

if the FOB node is of other part of speech, the FOB node is not merged with the HED node. If the subnode exists, combining the FOB node and all subnodes under the node, keeping the part of speech of the FOB node, and outputting a result; if no child node exists, the part of speech of the FOB node is kept, and the result is output.

Step S308, finding DBL nodes, if yes, the DBL nodes are not combined with HED nodes; if not, go to next step S309;

if the DBL node has child nodes, combining the DBL node and all child nodes of the DBL, keeping the part of speech of the DBL, and outputting a result;

and if the DBL node has no child node, keeping the part of speech of the DBL node and outputting a result.

Step S309, finding ADV nodes, if yes, judging the part of speech of the ADV nodes; if not, go to the next step S310;

and if the ADV node is the adverb class, judging whether the ADV node is close to the HED node. If the ADV node is close to the HED node, judging whether the ADV node has a child node, if so, combining the ADV node and the child node of the ADV node, and then combining the ADV node and the child node of the ADV node with the HED node, keeping the verb part of speech of the HED node, outputting a result, if not, combining the ADV node and the HED node, keeping the verb part of speech, and outputting; if the ADV node is not close to the HED node, the ADV node is not combined with the HED node, whether the ADV node has a child node or not is judged, if yes, after the ADV node and the ADV node are combined, the part of speech of the ADV node is kept, a result is output, if the ADV node does not have the child node, the part of speech of the ADV node is kept, and the result is output;

if the ADV node is a preposition class, the ADV node is not merged with the HED node. And outputting the preposition and the part of speech thereof. And judging the part of speech of the preposed child node POB node. If the word class is the verb class, the POB node is used as the core of the sentence, and the sentence is re-executed from step S303 to step S314. If the POB node is of other parts of speech, combining the POB node and all the sub-nodes of the POB, keeping the part of speech of the POB, outputting the result, and outputting the POB node and the part of speech thereof under the condition that the POB has no sub-nodes;

if the ADV node is of another part of speech, the ADV node is not merged with the HED node. If the ADV node has child nodes, combining the ADV node with all the child nodes under the node, keeping the part of speech of the ADV node, and outputting a result; if the ADV node has no child node, the part of speech of the ADV node is kept, and the result is output.

Step S310, finding CMP nodes, if so, judging the part of speech of the CMP nodes; if not, go to the next step S311;

if the CMP node is of verb type, the CMP node is not merged with the HED node, the CMP node is used as a sentence core, and the step S303-step S314 are executed again on the sentence;

if the CMP node is in the adjective category, judging whether the CMP node has a node, if so, combining the CMP node with all the sub-nodes under the CMP node, combining the CMP node with the HED node, keeping the part of speech of the HED node, and outputting a result; if the CMP node has no child node, the CMP node and the HED node are combined, the part of speech of the HED node is kept, and a result is output;

if the CMP node is of other parts of speech, the CMP node is not merged with the HED, if the CMP node has a child node, the CMP node is merged with all the child nodes under the CMP node, the part of speech of the CMP node is kept, and a result is output, if the CMP node has no child node, the part of speech of the CMP node is kept, and the result is output.

Step S311, finding the POB node, if yes, judging the part of speech of the POB node; if not, go to the next step S312;

if the POB node is of verb type, the POB node is not merged with the HED node, the POB node is used as a sentence core, and the step S303-step S314 are executed again on the sentence;

if the POB node is in other parts of speech, the POB node is not merged with the HED node, if the POB node has child nodes, the POB node and all the POB child nodes are merged, the part of speech of the POB is kept, and a result is output; if the POB node has no child node, the part of speech of the POB is kept, and the result is output.

Step S312, finding COO nodes, and if the COO nodes exist, judging the part of speech of the COO nodes; if not, go to the next step S313;

if the COO node is of verb type, the COO node is not merged with the HED node, the COO node is used as a sentence core, and the step S303-step S314 are executed again on the sentence.

And if the COO node is of other parts of speech, putting the sentence to be processed into the invalid sentence subset.

Step S313: finding the secondary child node LAD (left additional relation node)/RAD (right additional relation node) of the HED node, and judging whether the node is already combined in the previous steps to be used as a part of the phrase, if so, not needing to carry out more steps, and if not, directly outputting the node and the part of speech.

In step S314, if the HED node is not merged with other components for output, the HED node and the part of speech are output.

Still taking the above listed sentences to be processed as an example, the tag information shown in table 2 can be obtained. The phrase extraction process is as follows:

1. the flow of the 'summoning' sentence core;

1.1, finding the primary child node of the 'summons', comprising: SBV/VOB/IOB/FOB/DBL/ADV/CMP/COO/POB and other nodes;

SBV is a certain in king;

VOB: certain of plum;

ADV: day;

COO: putting forward;

1.2, judging the part of speech of the SBV node;

a certain Wang _ nh is a name class and is not merged with the HED node;

"a certain king" has child nodes, so all child nodes under the node are merged, and therefore, child nodes of "minor president" and "minor president" are merged, "logistics department";

and (3) outputting: subsidiary of logistics department, chang wang a certain _ nh;

1.3, judging the part of speech of the VOB node;

li certain _ nh is a name word class, so is not merged with the HED node;

"lie somewhere" has child node, so merge all child nodes under this node, so merge child node "great messenger", child node "somewhere", "resident" child node "hua";

and (3) outputting: certain _ nh of the Fuhua Dahua plum;

1.4, judging the part of speech of the ADV node;

25 Ri _ nt, is other part of speech, so is not merged with the HED node;

"25 days" has no child nodes, so the output: 25 days _ nt;

1.5, judging the part of speech of the COO node;

proposing _ v, which is a verb part of speech, and entering a flow of proposing a sentence core;

1.6, finding the secondary child node of the 'summons';

none;

2. the flow of "proposing" the sentence core;

2.1, finding out a level-one child node of 'proposed';

VOB: an engagement _ n;

ADV: p is measured;

COO: prompt _ v;

2.2, judging the part of speech of the VOB node;

the negotiation _ n, which is a noun class, is not merged with the HED;

"negotiate" has child node, so merge all child nodes under this node, so merge child nodes "rigorously" child node "objection", "child node" and "strong";

and (3) outputting: strict positive engagement and strong countermeasure _ n;

2.3, judging the part of speech of the ADV node;

for _ p, it is a word class, so it is not merged with the HED node;

and (3) outputting: p is measured;

judging the part of speech of the POB node, determining that the POB node is a moving part of speech through a _ v label, and entering a flow of 'passing' sentence core;

2.4, judging the part of speech of the COO node;

prompting _ v, which is a verb class, and entering a flow of prompting sentence core;

2.5, finding the secondary child node of 'proposed';

none;

3. flow "through" the sentence core;

3.1, finding out a first-level child node of 'pass';

judging a primary main node of 'pass';

VOB: the Law _ n;

3.2, judging the part of speech of the VOB node;

the Law _ n, which is a name class, is not merged with the HED;

the "law" has child nodes, so all child nodes under the node are merged, so that child nodes "trade" and "of" certain area "," economy "and" economy "are merged;

and (3) outputting: the economic and trade act _ n in a certain area;

3.3, finding the secondary child node of the 'pass';

none;

4. "prompt" the flow of the sentence core;

4.1, finding out a class-one child node of 'urging';

DBL is a certain square _ n;

correcting _ v in VOB;

4.2, judging whether the DBL node has a child node or not

The DBL node is not merged with the HED node, and the DBL node has no child nodes.

And (3) outputting: a certain side _ n

4.3, judging the part of speech of VOB

Correct _ v, is a verb class, enters the flow of "rectify" the sentence core.

4.4 finding the Secondary child node of "correction

Is free of

5. Flow for "correcting" sentence core

5.1 finding out corrected first-level child nodes

VOB: error _ n

ADV: immediate _ d

5.2, judging the part of speech of VOB

Error _ n, which is a noun class, is not merged with the HED.

No child node, so output: error _ n.

5.3, judging the part of speech of the ADV

Immediately _ d, is a side-word class, immediately adjacent to the HED, with no child nodes

And (3) outputting: immediate correction of _ v

5.4 finding the Secondary child node of "correction

Is free of

Through the above 5 sentence core flows, the final output result of the arrangement is:

subsidiary of logistics department, chang wang a certain _ nh;

25 days _ nt;

summons _ v;

one country stays the flower and leads the plum to have a certain _ nh;

p is measured;

some organization _ n;

pass _ v;

the economic and trade act _ n in a certain area;

proposing _ v;

strict positive engagement and strong countermeasure _ n;

prompt _ v;

a certain party _ n;

correcting _ v immediately;

error _ n.

That is, the corresponding part of speech of the phrase can be output simultaneously when the phrase is output.

The other sentence to be processed is used as a relevant innovation and innovation measure, and the specific effect is monitored timely and the adjustment and the popularization work are also selected. "is used as an example of the case,

the results of word segmentation and part of speech tagging are as follows:

as for the _ p-related _ n innovation _ n and _ c innovation _ n behavior _ n, w is used for d to monitor the _ v specific _ a effect _ n in time, w is used for v to select the _ v to do the _ v good _ a adjustment _ n and c to promote the _ n work _ n. 'w' of a chemical formula

The dependency syntax results are:

serial number	Word and phrase	Part of speech	Father node	Dependency relationship
					1	As for	p	11	ADV
2	Correlation	n	6	ATT
					3	Reform of	n	6	ATT
4	And	c	5	LAD
					5	innovation of	n	3	COO
6	Means for taking measures	n	1	POB
					7	，	wp	1	WP
8	Both is	d	9	ADV
					9	To be administered	v	11	ADV
10	At the right time	d	11	ADV
					11	Monitoring	v	0	HED
12	In particular to	a	13	ATT
					13	Success rate	n	11	VOB
14	，	wp	11	WP
					15	Also has	d	16	ADV
16	To be administered	v	17	ADV
					17	Machine selection	v	11	COO
18	Do it	v	17	COO
					19	Good taste	a	18	CMP
20	Adjustment of	n	23	ATT
					21	And	c	22	LAD
22	popularization of	n	20	COO
					23	Work by	n	18	VOB
24	。	wp	18	WP

The phrase extraction process is as follows:

"monitoring" the flow of sentence core

1.1, finding out a primary sub-node of monitoring;

VOB: effect _ n;

ADV: timely _ d;

ADV: v to;

ADV: as for _ p;

COO: selecting machine _ v;

1.2, judging the part of speech of the VOB node;

the success _ n, which is a noun class, is not merged with the HED node. "Performance" has child nodes, so all child nodes under that node are merged, so "concrete" and "Performance" are merged;

and (3) outputting: specific effect _ n;

1.3, judging the part of speech of the ADV node;

timely _ d, which is an adverb class, is next to the HED, has no child node, and the ADV node is merged with the HED node;

so output: timely monitoring _ v;

the important _ v is a verb class and enters the flow of the core of the 'important' sentence;

as for _ p, it is an interword class, so it is not merged with the HED node;

and (3) outputting: p is measured;

judging the part of speech of the POB node as the sub-node of the ADV node, wherein the action _ n is other part of speech, and the POB node has sub-nodes, so that the sub-nodes 'innovation' and 'of' action 'and sub-node' correlation, 'innovation' and 'innovation' are combined;

and (3) outputting: related innovation and innovation initiatives _ n;

1.4, judging the part of speech of the COO node;

and the chance _ v is a verb class and enters the flow of the core of the 'chance' sentence.

1.5, finding a secondary child node of monitoring;

none;

2. the flow of the "about" sentence core;

2.1, finding out a primary child node of 'main';

ADV: i.e., _ d;

2.2, judging the part of speech of the ADV node;

the word "d" is a side word class, is close to the HED node, has no child node, is merged with the HED node to keep the part of speech of the HED node, and outputs a result;

so output: required is _ v;

3. the flow of the 'chance selection' sentence core;

3.1, finding out a first-level child node of 'machine selection';

ADV: v to;

COO: making _ v;

3.2, judging the part of speech of the ADV node;

3.3, judging the part of speech of the COO node;

making _ v, namely a verb class, and entering a flow of making sentence core;

4. the flow of the "about" sentence core;

4.1, finding out a primary child node of 'main';

ADV: also _ d;

also _ d, which is an adverb class, is next to the HED node, has no child node, is merged with the HED node to keep the part of speech of the HED node, and outputs the result;

so output: also _ v;

5. the flow of 'doing' sentence core;

5.1, finding out primary child nodes for 'doing';

VOB: working _ n;

and (3) CMP: good _ a;

5.2, judging the part of speech of the VOB node;

job _ n, is a noun class and therefore is not merged with the HED node. "work" has child nodes, so all child nodes under the node are merged, so that "adjusted", "adjusted" child nodes "promote", "promoted" child nodes "and" promoted "child nodes" are merged;

and (3) outputting: adjusting and popularizing work _ n;

5.3, judging the part of speech of the CMP node;

good _ a, which is an adjective class, has no child node, is combined with the HED node, keeps the part of speech of the HED node, and outputs a result;

so output: making a _ v;

as for _ p;

related innovation and innovation initiatives _ n;

required is _ v;

timely monitoring _ v;

specific effect _ n;

also _ v;

selecting machine _ v;

making a _ v;

adjusting and popularizing work _ n;

the method is used for extracting phrases of sentences of which the cores are verbs, analyzes the grammar of the modern Chinese by utilizing three NLP basic modules of word segmentation, part of speech and dependency syntax, extracts the fixed usage and forms a merging rule. The training corpora are not needed, the combination is flexible, and the granularity of the phrases can be changed according to the increasing and decreasing steps.

The following are embodiments of the apparatus of the present application that may be used to implement embodiments of the above-described phrase extraction method of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the phrase extraction method of the present application.

Fig. 4 is a block diagram of a phrase extraction device according to an embodiment of the present application. As shown in fig. 4, the apparatus includes: a sentence obtaining module 410, a tag generating module 420, a part of speech judging module 430, a target word searching module 440 and a merging judging module 450.

A sentence obtaining module 410, configured to obtain a sentence to be processed;

a tag generation module 420, configured to perform word segmentation, part-of-speech tagging and dependency syntax processing on the sentence to be processed in sequence, and generate dependency relationship tags between different words and part-of-speech tags of each word;

a part-of-speech determining module 430, configured to determine whether the core relation word with the core relation tag is a verb according to the dependency relation tags between different words and the part-of-speech tag of each word;

the target word searching module 440 is configured to search, when the core relation word is a verb, a target word forming an appointed dependency relationship with the core relation word;

and a merging judgment module 450, configured to determine whether to merge and output the core relation word and the target word according to the tag information of the target word.

The implementation process of the functions and actions of each module in the device is specifically detailed in the implementation process of the corresponding step in the phrase extraction method, and is not repeated here.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of phrase extraction, the method comprising:

obtaining a sentence to be processed;

2. The method of claim 1, wherein the finding the target word that constitutes a specified dependency relationship with the core relationship word comprises:

3. The method according to claim 2, wherein if the part of speech of the target word is an adverb and is adjacent to the core related word, merging and outputting the core related word and the target word, comprising:

4. The method according to claim 2, wherein the determining whether to perform the merged output of the core relation word and the target word according to the tag information of the target word comprises:

5. The method according to claim 2, wherein the determining whether to perform the merged output of the core relation word and the target word according to the tag information of the target word comprises:

6. The method of claim 1, wherein the finding the target word that constitutes a specified dependency relationship with the core relationship word comprises:

7. The method of claim 6, wherein if the part of speech of the target word is an adjective, merging and outputting the core relation word and the target word, comprising:

8. The method according to claim 6, wherein the determining whether to perform the merged output of the core relation word and the target word according to the tag information of the target word comprises:

9. The method of claim 1, wherein the finding the target word that constitutes a specified dependency relationship with the core relationship word comprises:

10. The method according to claim 9, wherein the determining whether to perform the merged output of the core related word and the target word according to the tag information of the target word comprises:

11. The method according to claim 9, wherein the determining whether to perform the merged output of the core related word and the target word according to the tag information of the target word comprises:

12. The method of claim 1, wherein the finding the target word that constitutes a specified dependency relationship with the core relationship word comprises:

13. The method of claim 1, further comprising:

and outputting the corresponding part of speech of the phrase at the same time of outputting the phrase.

14. A phrase extraction apparatus, characterized in that the apparatus comprises:

15. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the phrase extraction method of any of claims 1-13.