CN112434515A

CN112434515A - Statement compression method and device, electronic equipment and readable storage medium

Info

Publication number: CN112434515A
Application number: CN202011386421.4A
Authority: CN
Inventors: 刘臣
Original assignee: Tianmian Information Technology Shenzhen Co ltd
Current assignee: Tianmian Information Technology Shenzhen Co ltd
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2021-03-02

Abstract

The invention relates to data processing, and discloses a statement compression method, which comprises the following steps: performing spoken language removal processing on the sentences to be compressed to obtain a target sentence set, and judging whether the number of sentences in the target sentence set is greater than a first threshold value or not; when the number of sentences in the target sentence set is judged to be larger than a first threshold value, sorting the importance of the sentences in the target sentence set, extracting key sentences based on a sorting result, and judging whether the sentence length of the key sentences is larger than a second threshold value; and when the sentence length of the key sentence is judged to be larger than the second threshold value, extracting the trunk words of the key sentence, and splicing the trunk words to obtain the target sentence. The invention also provides a sentence compression device, electronic equipment and a readable storage medium. The invention reduces the labeling cost and ensures the semantic accuracy of the compressed statement.

Description

Statement compression method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and an apparatus for compressing a statement, an electronic device, and a readable storage medium.

Background

The sentence compression technology is an important research direction in the field of natural language processing, redundant information in sentences can be removed through sentence compression processing, theme ideas are retained, reading and machine recognition of a user are facilitated, and the sentence compression technology can be used in multiple fields such as abstract generation, problem matching and theme extraction.

At present, a generative or extractable statement compression method is usually adopted, however, the generative statement compression method requires a large amount of labeled corpora for supervised learning, and the method is not suitable for the situations of limited project size and cost, small business data volume and labeled data loss; the traditional extraction type sentence compression method is related to the length of the sentence, and when the sentence is long, the compression effect of the extraction type compression method is not ideal, and the semantic information cannot be accurately reserved. Therefore, a sentence compression method is needed to reduce the labeling cost and ensure the semantic accuracy of the compressed sentences.

Disclosure of Invention

In view of the above, there is a need to provide a sentence compression method, which aims to reduce the labeling cost and ensure the semantic accuracy of the compressed sentences.

The statement compression method provided by the invention comprises the following steps:

analyzing a statement compression request sent by a user based on a client, acquiring a to-be-compressed statement carried by the request, performing spoken language removal processing on the to-be-compressed statement to obtain a target statement set, and judging whether the number of sentences in the target statement set is greater than a first threshold value or not;

when the number of sentences in the target sentence set is judged to be larger than a first threshold value, sorting the importance of the sentences in the target sentence set, extracting key sentences based on a sorting result, and judging whether the sentence length of the key sentences is larger than a second threshold value;

and when the sentence length of the key sentence is judged to be larger than a second threshold value, extracting the trunk words of the key sentence, and splicing the trunk words to obtain the target sentence.

Optionally, the performing spoken language removal processing on the to-be-compressed statement includes:

acquiring a spoken sentence dictionary from a first database, comparing each first clause in the sentences to be compressed with the spoken sentence dictionary, and deleting a specified first clause if one specified first clause is matched with one sentence in the spoken sentence dictionary to obtain an initial sentence set;

performing word segmentation processing on the sentences in the initial sentence set to obtain a first word sequence;

recognizing the spoken words in the first word sequence based on a spoken word recognition model, and deleting the spoken words to obtain a second word sequence;

and splicing the words in the second word sequence according to the positions of the words in the sentence to be compressed to obtain a plurality of second clauses, and taking the set of the second clauses as a target sentence set.

Optionally, the sorting the importance of the sentences in the target sentence set and extracting the key sentences based on the sorting result include:

combining each sentence in the target sentence set with other sentences pairwise to obtain a plurality of combination pairs;

calculating similarity values of two sentences of each combination pair in the plurality of combination pairs, and determining a similarity matrix corresponding to the target sentence set based on the similarity values;

and calculating the importance scores of the sentences in the target sentence set based on the similarity matrix, sequencing the sentences in the target sentence set according to the sequence of the importance scores from high to low, and taking the sentence with the highest sequence as a key sentence.

Optionally, after determining whether the number of sentences in the target sentence set is greater than a first threshold, the method further includes:

if the number of sentences in the target sentence set is judged to be less than or equal to a first threshold value, determining the sentence type of the sentence to be compressed, acquiring an extraction rule corresponding to the sentence type from a second database, extracting the sentences from the target sentence set based on the extraction rule, and splicing the extracted sentences to obtain a key sentence.

Optionally, the extracting the stem words of the key sentences includes:

performing word segmentation processing on the key sentence to obtain a third word sequence;

sequentially identifying the part of speech of each word in the third word sequence, determining a syntactic structure of the third word sequence based on the part of speech and a preset syntactic analysis strategy, and extracting the stem word in the third word sequence based on the syntactic structure.

Optionally, after determining whether the sentence length of the key sentence is greater than a second threshold, the method further includes:

and if the sentence length of the key sentence is judged to be less than or equal to a second threshold value, taking the key sentence as a target sentence.

In order to solve the above problem, the present invention also provides a sentence compressing apparatus, comprising:

the analysis module is used for analyzing a statement compression request sent by a user based on a client, acquiring a statement to be compressed carried by the request, executing spoken language removal processing on the statement to be compressed to obtain a target statement set, and judging whether the number of the sentences in the target statement set is greater than a first threshold value or not;

the sorting module is used for sorting the importance of the sentences in the target sentence set when the number of the sentences in the target sentence set is judged to be larger than a first threshold value, extracting key sentences based on a sorting result, and judging whether the sentence length of the key sentences is larger than a second threshold value or not;

and the extraction module is used for extracting the trunk words of the key sentences and splicing the trunk words to obtain the target sentences when the sentence length of the key sentences is judged to be larger than a second threshold value.

In order to solve the above problem, the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a sentence compression program executable by the at least one processor, the sentence compression program being executed by the at least one processor to enable the at least one processor to perform the above sentence compression method.

In order to solve the above problem, the present invention also provides a computer-readable storage medium having stored thereon a sentence compression program executable by one or more processors to implement the above sentence compression method.

Compared with the prior art, the method has the advantages that firstly, the spoken language removing processing is carried out on the sentence to be compressed to obtain the target sentence set, the spoken language removing processing is carried out, spoken sentences and spoken words without semantic information in the sentence to be compressed are removed, and the preliminary compression of the sentence to be compressed is realized; secondly, when the number of sentences in the target sentence set is judged to be larger than a first threshold value, the sentences in the target sentence set are subjected to importance degree sequencing, key sentences are extracted based on a sequencing result, the key sentences in the sentences to be compressed are extracted through the importance degree sequencing, redundant information is further removed, and semantic information of the sentences to be compressed is reserved; and finally, when the sentence length of the key sentence is judged to be larger than a second threshold value, extracting the trunk words of the key sentence, splicing the trunk words to obtain the target sentence. Therefore, the invention reduces the labeling cost and ensures the semantic accuracy of the compressed statement.

Drawings

Fig. 1 is a schematic flow chart of a sentence compression method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a sentence compressing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device implementing a statement compression method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The invention provides a statement compression method. Fig. 1 is a schematic flow chart of a sentence compression method according to an embodiment of the present invention. The method may be performed by an electronic device, which may be implemented by software and/or hardware.

In this embodiment, the statement compression method includes:

s1, analyzing a statement compression request sent by a user based on a client, acquiring a to-be-compressed statement carried by the request, executing spoken language removal processing on the to-be-compressed statement to obtain a target statement set, and judging whether the number of sentences in the target statement set is greater than a first threshold value.

The executing the spoken language removal processing on the statement to be compressed includes:

a11, acquiring a spoken sentence dictionary from a first database, comparing each first clause in the sentences to be compressed with the spoken sentence dictionary, and deleting a specified first clause if a specified first clause is matched with one sentence in the spoken sentence dictionary to obtain an initial sentence set;

a12, performing word segmentation processing on the sentences in the initial sentence set to obtain a first word sequence;

in this embodiment, the sentences in the initial sentence set may be segmented by using a statistical probability model or/and a segmentation method based on an N-gram language model.

A13, recognizing the spoken words in the first word sequence based on a spoken word recognition model, and deleting the spoken words to obtain a second word sequence;

in this embodiment, the spoken language word recognition model is a deep neural network model, and the deep neural network model recognizes part-of-speech tags of each word in the first word sequence, and rejects spoken languages (such as language, spoken words, and the like) based on the part-of-speech tags.

A14, splicing the words in the second word sequence according to the positions of the words in the sentence to be compressed to obtain a plurality of second clauses, and taking the set of the second clauses as a target sentence set.

In this embodiment, the sentence to be compressed is a long sentence composed of a plurality of sentences, and the spoken sentence dictionary stores a plurality of spoken sentences without semantic information.

For example, if the sentence to be compressed is "quota", i know that i can borrow. I send wages 15 a month later. In addition, I can only wait for No. 15 payroll. Payroll was issued 15 times per month. There is no way. "

And taking a set of sentences left after the saliva sentence "these sentences are known by me" and the mood word "amount" is removed as a target sentence set.

S2, when the number of sentences in the target sentence set is judged to be larger than a first threshold value, the sentences in the target sentence set are subjected to importance ranking, key sentences are extracted based on the ranking result, and whether the sentence length of the key sentences is larger than a second threshold value is judged.

In this embodiment, the first threshold may be 5.

The sorting the importance of the sentences in the target sentence set and extracting the key sentences based on the sorting result comprises the following steps:

b11, combining each sentence in the target sentence set with other sentences pairwise respectively to obtain a plurality of combination pairs;

b12, calculating similarity values of two sentences of each combination pair in the plurality of combination pairs, and determining a similarity matrix corresponding to the target sentence set based on the similarity values;

b13, calculating the importance scores of the sentences in the target sentence set based on the similarity matrix, sequencing the sentences in the target sentence set according to the sequence of the importance scores from high to low, and taking the sentence with the highest sequence as a key sentence.

Assume that the similarity values corresponding to each combination pair in the target sentence set are shown in table 1 below:

similarity value	Sentence 1	Sentence 2	Sentence 3
				Sentence 1	1	0.63	0.44
Sentence 2	0.63	1	0.78
				Sentence 3	0.44	0.78	1

TABLE 1

Then the similarity matrix corresponding to the target sentence set is

The calculation formula of the importance score is as follows:

w_i＝(1-d)+d*s*w_i′

wherein, w_iThe importance value of the ith sentence in the target sentence set, d is a damping coefficient (the value range is 0-1, the value is generally 0.85), s is a similarity matrix corresponding to the target sentence set, and w is_i′And obtaining the importance value of the ith sentence in the target sentence set in the previous iteration.

In this embodiment, the initial importance score of each sentence is 1, the final importance score of each sentence is calculated by iterative propagation according to the above importance score calculation formula, and convergence is achieved when the error rate of any one sentence is less than a given limit value (e.g., 0.0001).

In this embodiment, the similarity value of two sentences in each combination pair can be calculated by using a cosine similarity, an euclidean distance, a manhattan distance, and a minkowski distance algorithm.

After determining whether the number of sentences in the target sentence set is greater than a first threshold, the method further comprises:

In this embodiment, when the number of sentences in the target sentence set is less than a first threshold (e.g., 5), the sentence types of the sentence to be compressed are determined, where the sentence types include a question sentence pattern, an answer sentence pattern, and a statement sentence pattern, and the second database stores extraction rules corresponding to various sentence types in advance, for example, the extraction rule corresponding to the question sentence pattern may be to extract two sentences of the target sentence set located at the end of the sentence to be compressed, the extraction rule corresponding to the answer sentence pattern may be to extract two sentences of the target sentence set located at the beginning and the end of the sentence to be compressed, and the statement sentence pattern may be to extract three sentences of the target sentence set located at the beginning and the end and the middle of the sentence to be compressed.

In this embodiment, the extraction rule is not limited, and the user may set the corresponding extraction rule according to a specific scenario.

And splicing the extracted sentences according to the sequence of the sentences in the sentences to be compressed to obtain the key sentences.

And S3, when the sentence size of the key sentence is judged to be larger than a second threshold value, extracting the trunk words of the key sentence, and splicing the trunk words to obtain the target sentence.

In this embodiment, the extracting the stem words of the key sentence includes:

c11, performing word segmentation processing on the key sentence to obtain a third word sequence;

and C12, sequentially identifying the part of speech of each word in the third word sequence, determining the syntactic structure of the third word sequence based on the part of speech and a preset syntactic analysis strategy, and extracting the stem words in the third word sequence based on the syntactic structure.

In this embodiment, the parts of speech include nouns, verbs, adjectives, prepositions, negative words, adverbs, auxiliary words, and the like.

The preset syntactic analysis strategy is dependency syntactic analysis, and the determining of the syntactic structure of the third word sequence based on the part of speech and the preset syntactic analysis strategy comprises:

d11, determining a core word in the third word sequence based on the part of speech;

usually, the verb is a core word (there is usually only one verb in a sentence).

D12, determining the membership among the words in the third word sequence;

for example, if the key statement is: i eat a big apple, then the third word sequence is { I, eat, one, big, apple }, and its core word is "eat".

When the dependency relationship is analyzed, "one" belongs to "apple" and "big" also belongs to "apple".

D13, determining the syntactic structure of the third word sequence according to the core words and the affiliations.

The syntax structure includes: a cardinal relationship, a dynamic guest relationship, an inter-guest relationship, a preposition guest, a centering relationship, a structure in shape, a dynamic complement structure, a parallel relationship, a mediate guest relationship, a left addition relationship, and a right addition relationship.

The third word sequence { i, eat, one, large, apple } corresponds to a syntactic structure of { 2: major-minor relationship, 6: moving guest relationship, 2: dynamic complement relationship, 6, centering relationship, 6: centering relationship, 2: move guest relationship }.

The stem words extracted according to the syntax structure are 'I', 'eat' and 'apple'.

In this embodiment, after determining whether the sentence length of the key sentence is greater than a second threshold, the method further includes:

According to the embodiment, the statement compression method provided by the invention comprises the steps of firstly, executing the spoken language removal processing on a statement to be compressed to obtain a target statement set, and eliminating spoken sentences and spoken words without semantic information in the statement to be compressed by executing the spoken language removal processing to realize the preliminary compression of the statement to be compressed; secondly, when the number of sentences in the target sentence set is judged to be larger than a first threshold value, the sentences in the target sentence set are subjected to importance degree sequencing, key sentences are extracted based on a sequencing result, the key sentences in the sentences to be compressed are extracted through the importance degree sequencing, redundant information is further removed, and semantic information of the sentences to be compressed is reserved; and finally, when the sentence length of the key sentence is judged to be larger than a second threshold value, extracting the trunk words of the key sentence, splicing the trunk words to obtain the target sentence. Therefore, the invention reduces the labeling cost and ensures the semantic accuracy of the compressed statement.

Fig. 2 is a block diagram of a sentence compressing apparatus according to an embodiment of the present invention.

The sentence compressing apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the sentence compressing apparatus 100 may include a parsing module 110, an ordering module 120, and an extracting module 130. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the parsing module 110 is configured to parse a statement compression request sent by a user based on a client, obtain a to-be-compressed statement carried by the request, perform a spoken language removal process on the to-be-compressed statement to obtain a target statement set, and determine whether the number of sentences in the target statement set is greater than a first threshold.

a21, acquiring a spoken sentence dictionary from a first database, comparing each first clause in the sentences to be compressed with the spoken sentence dictionary, and deleting a specified first clause if a specified first clause is matched with one sentence in the spoken sentence dictionary to obtain an initial sentence set;

a22, performing word segmentation processing on the sentences in the initial sentence set to obtain a first word sequence;

A23, recognizing the spoken words in the first word sequence based on a spoken word recognition model, and deleting the spoken words to obtain a second word sequence;

A24, splicing the words in the second word sequence according to the positions of the words in the sentence to be compressed to obtain a plurality of second clauses, and taking the set of the second clauses as a target sentence set.

The sorting module 120 is configured to, when it is determined that the number of sentences in the target sentence set is greater than a first threshold, sort the importance of the sentences in the target sentence set, extract key sentences based on a sorting result, and determine whether a sentence length of the key sentences is greater than a second threshold.

In this embodiment, the first threshold may be 5.

b21, combining each sentence in the target sentence set with other sentences pairwise respectively to obtain a plurality of combination pairs;

b22, calculating similarity values of two sentences of each combination pair in the plurality of combination pairs, and determining a similarity matrix corresponding to the target sentence set based on the similarity values;

b23, calculating the importance scores of the sentences in the target sentence set based on the similarity matrix, sequencing the sentences in the target sentence set according to the sequence of the importance scores from high to low, and taking the sentence with the highest sequence as a key sentence.

Assume that the similarity values corresponding to the respective combination pairs in the target sentence set are as shown in table 1 above.

Then the similarity matrix corresponding to the target sentence set is

The calculation formula of the importance score is as follows:

w_i＝(1-d)+d*s*w_i′

After determining whether the number of sentences in the target sentence set is greater than the first threshold, the sorting module 120 is further configured to:

And the extracting module 130 is configured to, when it is determined that the sentence length of the key sentence is greater than a second threshold, extract the trunk words of the key sentence, and splice the trunk words to obtain the target sentence.

In this embodiment, the extracting the stem words of the key sentence includes:

c21, performing word segmentation processing on the key sentence to obtain a third word sequence;

and C22, sequentially identifying the part of speech of each word in the third word sequence, determining the syntactic structure of the third word sequence based on the part of speech and a preset syntactic analysis strategy, and extracting the stem words in the third word sequence based on the syntactic structure.

d21, determining a core word in the third word sequence based on the part of speech;

D22, determining the membership among the words in the third word sequence;

D23, determining the syntactic structure of the third word sequence according to the core words and the affiliations.

In this embodiment, after determining whether the sentence length of the key sentence is greater than a second threshold, the extracting module 130 is further configured to:

It can be seen from the foregoing embodiment that, in the sentence compressing apparatus 100 provided by the present invention, firstly, the spoken language removal processing is performed on the sentence to be compressed to obtain the target sentence set, and the spoken language removal processing is performed to remove spoken sentences and spoken words without semantic information in the sentence to be compressed, so as to achieve the preliminary compression of the sentence to be compressed; secondly, when the number of sentences in the target sentence set is judged to be larger than a first threshold value, the sentences in the target sentence set are subjected to importance degree sequencing, key sentences are extracted based on a sequencing result, the key sentences in the sentences to be compressed are extracted through the importance degree sequencing, redundant information is further removed, and semantic information of the sentences to be compressed is reserved; and finally, when the sentence length of the key sentence is judged to be larger than a second threshold value, extracting the trunk words of the key sentence, splicing the trunk words to obtain the target sentence. Therefore, the invention reduces the labeling cost and ensures the semantic accuracy of the compressed statement.

Fig. 3 is a schematic structural diagram of an electronic device for implementing a statement compression method according to an embodiment of the present invention.

The electronic device 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. The electronic device 1 may be a computer, or may be a single network server, a server group composed of a plurality of network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, where cloud computing is one of distributed computing and is a super virtual computer composed of a group of loosely coupled computers.

In the embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13, which are communicatively connected to each other through a system bus, where the memory 11 stores a statement compression program 10, and the statement compression program 10 is executable by the processor 12. Fig. 3 only shows the electronic device 1 with the components 11-13 and the sentence compression program 10, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

The storage 11 includes a memory and at least one type of readable storage medium. The memory provides cache for the operation of the electronic equipment 1; the readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the electronic device 1, such as a plug-in hard disk provided on the electronic device 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. In this embodiment, the readable storage medium of the memory 11 is generally used for storing an operating system and various application software installed in the electronic device 1, for example, code of the statement compression program 10 in an embodiment of the present invention. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is generally configured to control the overall operation of the electronic device 1, such as performing control and processing related to data interaction or communication with other devices. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the statement compression program 10.

The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is used for establishing a communication connection between the electronic device 1 and a client (not shown).

Optionally, the electronic device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The statement compression program 10 stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, which when executed in the processor 12, can implement:

Specifically, the processor 12 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the statement compression program 10, which is not described herein again.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or non-volatile. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

The computer-readable storage medium stores a sentence compression program 10, and the sentence compression program 10 can be executed by one or more processors, and the specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the sentence compression method, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of sentence compression, the method comprising:

2. The sentence compression method of claim 1, wherein the performing of the spoken language removal processing on the sentence to be compressed comprises:

3. The sentence compression method of claim 1, wherein the sorting of the importance of the sentences in the target sentence set and the extracting of the key sentences based on the sorting result comprises:

4. The sentence compression method of claim 1 wherein after determining whether the number of sentences in the target sentence set is greater than a first threshold, the method further comprises:

5. The sentence compression method of claim 1, wherein the extracting stem words of the key sentence comprises:

6. The sentence compression method of any one of claims 1-5, wherein after determining whether the sentence length of the key sentence is greater than a second threshold, the method further comprises:

7. A sentence compression apparatus, the apparatus comprising:

8. The sentence compression apparatus of claim 7 wherein the ranking of the importance of the sentences in the target sentence set and the extracting of the key sentences based on the ranking result comprises:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a sentence compression program executable by the at least one processor, the sentence compression program being executable by the at least one processor to enable the at least one processor to perform the sentence compression method of any of claims 1-6.

10. A computer-readable storage medium having stored thereon a sentence compression program executable by one or more processors to implement the sentence compression method of any of claims 1-6.