WO2013107308A1

WO2013107308A1 - Method and apparatus for aggregating information

Info

Publication number: WO2013107308A1
Application number: PCT/CN2013/070146
Authority: WO
Inventors: 黄波
Original assignee: 华为终端有限公司
Priority date: 2012-01-20
Filing date: 2013-01-07
Publication date: 2013-07-25
Also published as: CN103218372A; CN103218372B

Abstract

Disclosed are a method and an apparatus for aggregating information, belonging to the field of information identification. The method comprises: obtaining a text to be aggregated; obtaining a location label of an information amount in the text; according to the location label, calculating the distance between every two information amounts; when a first distance is equal to a second distance, correcting the first distance and the second distance according to a grammatical structure, the first distance being the distance between a first information amount and a second information amount of the information amounts, and the second distance being the distance between the first information amount and a third information amount of the information amounts; and aggregating the information amounts according to the corrected first distance and second distance, to obtain a structural body.

Description

The present invention claims the priority of the Chinese Patent Application entitled "Method and Apparatus for Aggregating Information", filed on January 20, 2012, in the Chinese Patent Office, Application No. 20121001894. This is incorporated herein by reference.

Technical field

The present invention relates to the field of information recognition, and in particular, to a method and apparatus for aggregating information.

Background technique

Aggregate information is the combination of different information with intrinsic links into a structure, such as a person's name, phone number, and email address. If the information belongs to someone's data, then the person's name, phone number, and email address can be combined into one. Large blocks of information, forming a structure: (personal name, phone number, email address). With information aggregation technology, users can provide one-stop personalized service with multi-source information.

Aggregate information is an important part of information extraction. The core of aggregated information is to use a quantifiable standard. Choosing different metrics will affect the effect of information aggregation, which will affect the final result of information extraction. A common method of information aggregation is the location labeling method. The method includes: first locating the words in the text, so that each information quantity has a unique position label in the text, and then using the position label to obtain a distance, the distance is It expresses the close relationship between the two information quantities, and finally aggregates the information amount according to the quantized near-near relationship to obtain the structure.

In the process of implementing the present invention, the inventors have found that the prior art has at least the following problems: The location labeling method in the prior art provides a quantitative standard between the amounts of information, which only focuses on the location of the amount of information and the amount of information. The distance between the two, and the distance is quantified according to the distance, but when there is a distance equal to the amount of information before and after the amount of information, the location label method does not propose a rigorous solution, if random Polymerization, due to the amount of information aggregated before An information amount, an aggregated information amount, and a subsequent information amount may obtain completely different aggregation results, and the obtained aggregation result may be inaccurate, and the inaccurate information amount is provided to the subsequent information extraction process, which will affect the entire information extraction. accuracy.

Summary of the invention

Embodiments of the present invention provide a method and apparatus for aggregating information. The technical solution is as follows: A method for aggregating information, the method comprising:

Get the text to be aggregated;

Obtaining a location tag of the amount of information in the text;

Calculating, according to the location tag, a distance between each two information amounts; when the first distance and the second distance are equal, correcting the first distance and the second distance according to a syntax structure, where the first distance is a distance between the first information amount and the second information amount, wherein the second distance is a distance between the first information amount and the third information amount in the information amount;

The amount of information is aggregated based on the corrected first distance and second distance to obtain a structure.

An apparatus for aggregating information, the apparatus comprising:

a text acquisition module, configured to acquire text to be aggregated;

a location label obtaining module, configured to acquire a location label of the amount of information in the text;

a calculation module, configured to calculate a distance between each two information quantities according to the position label; and a correction module, configured to correct the first distance and the second according to a grammatical structure when the first distance and the second distance are equal a distance, wherein the first distance is a distance between a first information amount and a second information quantity in the information amount, and the second distance is the first information quantity and the third information in the information quantity Great separation between quantities;

An aggregation module, configured to perform the information amount according to the corrected first distance and the second distance Polymerize to obtain a structure.

An embodiment of the present invention provides a method and apparatus for aggregating information, by acquiring a location tag of an amount of information in the text; calculating a distance between each two information amounts according to the location tag; when the first distance and the first When the two distances are equal, the first distance and the second distance are corrected according to a grammatical structure, wherein the first distance is a distance between the first information amount and the second information quantity in the information amount, and the second The distance is a distance between the first information amount and the third information amount in the information amount; and the information amount is aggregated according to the corrected first distance and the second distance to obtain a structure. In the embodiment of the present invention, when the distance between the information amounts is equal, the distance is corrected according to the grammatical structure, and the information amount is aggregated according to the corrected distance, and the grammatical structure is taken into consideration based on the aggregation according to the location label. , improve the accuracy of information aggregation.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in view of the drawings.

1 is a flowchart of a method for aggregating information according to an embodiment of the present invention;

2 is a flowchart of a method for aggregating information according to an embodiment of the present invention;

3 is a schematic structural diagram of an apparatus for aggregating information according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for aggregating information according to an embodiment of the present invention.

detailed description

The embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a method for aggregating information according to an embodiment of the present invention. This embodiment can It can be implemented on terminals including mobile phones, personal computers and tablets, and can also be applied to servers, for example, when monitoring users' emails or short messages, to automatically aggregate information that is of interest to users. Referring to FIG. 1, the embodiment specifically includes:

101. Obtain the text to be aggregated.

In this embodiment, the text may be data including a character string, a punctuation mark, a line feed, and the like.

It should be noted that the text may be the text currently received by the terminal, or may be the text already saved by the terminal user and stored in the terminal. This example is only described by taking the text as the text currently received by the terminal as an example. The text may be a user's mail or a short message, and may be other files, which are not limited in this embodiment of the present invention.

102. Obtain a location label of each information amount in the text.

The amount of information refers to a string of certain attributes and meanings of some of the files, for example, may be a person's name, a phone number, an email address, and the like. These strings are useful information for information extraction, or information that users pay attention to, besides the person's name, phone number, and email address, they can also be conference topics, meeting locations, meeting content, and so on. In practical applications, sentence segmentation techniques can be used to first divide a continuous string in each sentence in a file into different words, and then determine whether each of these words is the amount of information that needs attention. For example, you can pre-define some categories of information that need attention, classify the segmented word segments, and then determine whether they are the amount of information to be concerned according to the category of each word. In addition, other ways can be used to identify the amount of information in the file. For example, you can set some vocabulary that needs attention, and then filter the contents of the file according to these vocabularies to find out the amount of information that needs attention.

Of course, there are many other ways to identify the amount of information in the file, which is not limited in this embodiment of the present invention.

The method of the embodiment of the present invention is applied to the case where the amount of information is three or more. Each information quantity has a unique position in the text. In this embodiment, the location is identified by a location tag. Preferably, the specific content of the location tag includes: a natural paragraph position of the information amount in the text, and an initial Position and end position, the location label can be in the form of (paragraph position, start position, end position). Where the paragraph position is the natural paragraph position of the amount of information in the text; for example, the amount of information is in the first paragraph of the text, its value is 1, if in the second paragraph, the value is 2, and so on. The maximum number of characters in a paragraph is a constant, which is recorded as max_size. This value usually takes the maximum number of characters in all paragraphs in the text. For example, there are three paragraphs in the text, the first paragraph has 100 bytes, and the second paragraph There are 500 bytes, and the third segment has 1000 bytes, then the max_size = max (100, 500, 1000) = 1000 of the text. In addition, the starting position is the starting position of the information amount in the text, and the ending position is the ending position of the information amount in the text, and the starting position and the ending position are the coordinates of the information amount in the paragraph.

For example: "Xiao Ming went to Beijing for a business trip today. His phone number is 12345678." 4 The above paragraph is in the nth paragraph of the text. In the encoding format of GB2313, each Chinese character occupies two positions (for example: Bytes) ), the number occupies a position space, the starting position is 1 and the ending position is 23. It should be noted that the starting position and ending position of the information amount are also affected by the encoding format used in the paragraph. For example, in ascii encoding, each English character occupies one byte.

Then the amount of information and its location label are as follows:

Xiao Ming (n, 1 , 4), he (n, 21, 22), telephone (n, 25, 28), 12345678 (n, 31,

38).

103. Calculate a distance between each two information amounts according to the location label.

Specifically, the location tag value of the information amount may be first calculated according to the location tag, and then the distance between each two information amounts is calculated according to the location tag value. The calculation formula for the position label value is as follows: Position label value = paragraph position X paragraph maximum number of characters + (start position + end position Set) /2.

The formula for the distance between each two information quantities is: distance = |L( _X ) - L(y) where L(x) and L y) are the position label value and information amount of the information quantity X, respectively. The location tag value.

104. When the first distance and the second distance are equal, correct the first distance and the second distance according to a grammatical structure, where the first distance is a distance between the first information quantity and the second information quantity in the information quantity, The second distance is a distance between the first amount of information and the third amount of information in the amount of information.

The first information amount, the second information amount, and the third information amount are only used to refer to any three pieces of information in which the positional relationship described in this embodiment exists in the acquired information amount. The grammatical structure refers to a vocabulary attribute or a sentence component of the first information amount, the second information amount, and the third information amount.

It can be understood that when the first distance is greater than the second distance, indicating that the third information quantity is closer to the first information quantity, and the second information quantity is further away from the first information quantity, then in the aggregation, the first information quantity is The third information amount is aggregated to obtain a structure; when the first distance is smaller than the second distance, the third information quantity is further away from the first information quantity, and the second information quantity is closer to the first information quantity, then the aggregation is performed. At the time, the first amount of information and the second amount of information are aggregated to obtain a structure.

If, for the first amount of information, when the first distance and the second distance are equal, the lexical attribute or the sentence component that has been determined according to the first information amount, the second information amount, and the third information amount, The second distance is corrected to avoid inaccurate information aggregation due to the equality of the first distance and the second distance.

105. The amount of information is aggregated according to the corrected first distance and the second distance to obtain a polymerized structure.

The specific process of the aggregation is the same as the prior art. The aggregation refers to the classification and sorting of the amount of information, so that in the process of extracting information in the subsequent process, the information that is sorted and sorted is fed back to the user, instead of being disordered. Information. The structure is a general term for the aggregation results after the aggregation of the information amount. For a large amount of information, it is necessary to classify and sort them, and return the structures arranged or combined according to the preset rules.

In an actual application, after the terminal device obtains the foregoing structure, the structure may be saved in a corresponding file, and/or directly displayed to an end user or a server user for user selection and the like.

In the method provided by the embodiment, when the distance between the information amounts is equal, the distance is corrected according to the grammatical structure, and the information amount is aggregated according to the corrected distance, and the aggregation is performed according to the location label. The grammatical structure improves the accuracy of information aggregation and the performance of subsequent information extraction. FIG. 2 is a flowchart of a method for aggregating information according to an embodiment of the present invention. Referring to Figure 2, the embodiment specifically includes:

201. Acquire text to be aggregated;

The text in the step 201 is the same as that in the step 101, and details are not described herein again.

Specifically, after receiving the text, the text in the text is identified according to the saved dictionary, wherein the recognition is to enable the terminal to learn the text in the text, compose the words into words or sentences, and perform subsequent steps according to the recognized words or sentences. process.

202. Obtain three or more information amounts according to preset keywords;

In this embodiment, the terminal acquires three or more information amounts in the text according to the preset keyword, and the three or more information amounts may be words, numbers, letters, and the like.

It can be understood that the embodiment is described by taking three or more pieces of information as an example, and in other embodiments, when the amount of information acquired is one, no aggregation is needed, and the information may be used. The amount is used as a structure, and when the amount of information acquired is two, it can be based on the existing aggregation principle. Polymerize to obtain a structure.

It should be noted that the triggering of the acquisition of the information volume may include, but is not limited to, the following situations:

(1) The terminal extracts the received text, and when the text is received, the information in the text is acquired, and the aggregated information is aggregated according to the acquired information, and the aggregated structure can be saved to the corresponding In the file, and/or directly to the end user or server user for the user to select and other operations.

(2) The terminal extracts the locally saved text at intervals of time, and then acquires the amount of information in the text every other period of time, and aggregates according to the obtained information amount, and the aggregated structure can be saved to In the corresponding file, and/or directly to the end user or server user for the user to select and other operations.

203. Obtain lexical attributes of the three or more information quantities in the text, and obtain the sentence components of the three or more information quantities according to the obtained attributes;

The vocabulary attribute refers to a noun, an adjective, a verb, an adverb, etc., and the sentence component refers to a subject, a predicate, an object, etc., and the Chinese grammar is taken as an example. Generally speaking, the vocabulary attribute is a noun information amount can be used as a subject or The object, and the vocabulary attribute is a predicate of the verb. In this embodiment, the amount of information in the text is analyzed according to the defined vocabulary attributes in the Chinese grammar library, and the vocabulary attribute of each information quantity is obtained, and then The lexical attribute and the categorization or definition of the vocabulary attribute in the Chinese grammar library, and the sentence component of the information amount.

204. Obtain a location label of each information amount in the text;

This step 204 is the same as step 102, and details are not described herein again.

205. Calculate a distance between each two information amounts according to the obtained location label.

The position label is a coordinate of the position of the information amount in the text, and according to the position label, the position label value of the information amount can be known. Based on the example of step 102, the position label value of the information quantity is For:

L (小明) = nxmax size + 5/2

L (he) = nxmax size + 43/2

L (telephone) = nxmax size + 53/2

L(12345678) = nxmax size + 59/2

Thus, the distance between the above information amounts is:

d (小明, he) = 19

d (telephone, 12345678) = 3

d (he, phone) = 5

206. When the first distance and the second distance are equal, correct the first distance and the second distance according to a grammatical structure, where the first distance is the first information quantity and the second information quantity of the at least two information quantities a distance between the first information amount and the third information amount of the at least two information amounts;

For the text, the first distance and the second distance are equal, and it can be understood that the second information amount and the third information amount are respectively located before and after the first information amount.

The grammatical structure refers to a vocabulary attribute or a sentence component of the first information amount, the second information amount, and the third information amount.

When the first distance and the second distance are equal, acquiring the first information amount and the second information amount according to a grammatical structure and a sentence component or a vocabulary attribute of the first information amount, the second information amount, and the third information amount The tightness between the tightness, the first amount of information, and the third amount of information corrects the first distance and the second distance according to the tightness of the acquisition.

In the step 203-206 in the embodiment, the vocabulary attribute is obtained, and then the sentence component is obtained according to the vocabulary attribute as an example. Alternatively, in another embodiment, step 203 may be replaced. For: obtaining a vocabulary attribute of the amount of information in the text, and correspondingly, step 206 is replaced by: when the first distance and the second distance are equal, according to the grammatical structure and the first amount of information, the second amount of information, and The three-information vocabulary attribute corrects the first distance and the second distance. Specifically, when the first distance and the second distance are equal, the first information amount and the second information amount are acquired according to a grammatical structure and vocabulary attributes of the first information amount, the second information amount, and the third information amount. The tightness between the tightness, the first amount of information, and the third amount of information corrects the first distance and the second distance according to the tightness of the acquisition.

The terminal may pre-store the correspondence between the sentence component, the vocabulary attribute and the closeness, and obtain the closeness corresponding to the information quantity according to the sentence component or the vocabulary attribute of the information quantity, and the closeness may refer to the grammar of the language. The setting is performed, and the different sentence components correspond to different closenesses, and different vocabulary attributes correspond to different closenesses, and the specific value can be set by a technician, which is not specifically limited in this embodiment.

Obtaining the closeness corresponding to the sentence component or the vocabulary attribute according to the sentence component or the vocabulary attribute determined by each information amount, and correcting the distance between the information amounts according to the tightness, and the specific correction process may include: The tightness between the first amount of information and the second amount of information is greater than the tightness between the first amount of information and the third amount of information, then subtracting a disturbance value from the first distance and/or adding to the second distance The last disturbance value is such that the corrected first distance and the second distance are no longer equal, and the information is aggregated according to the corrected first distance and the second distance. When the tightness between the first amount of information and the second amount of information is less than the tightness between the first amount of information and the third amount of information, adding a disturbance value to the first distance and/or at the second distance Subtracting a disturbance value such that the corrected first distance and the second distance are no longer equal, and the information is aggregated according to the corrected first distance and the second distance. Among them, the value of the disturbance amount can be adjusted according to different syntax components, and the appropriate disturbance amount can be selected to ensure that the distance between the information amounts is unique. It should be noted that the difference in tightness may also be expressed in other ways, such as multiplication or division by the disturbance coefficient, as long as the corrected first distance and the second distance are no longer equal. And can reflect the difference in tightness. According to the grammatical structure, the distance between the information quantities is corrected, so that the quantitative metrics of "before and after" and "far near" between the information quantities are considered, and the distance between the information amounts is redefined by increasing or decreasing a disturbance amount.

207. The three or more information amounts are aggregated according to the corrected first distance and the second distance to obtain a structure after polymerization.

This step 207 is the same as step 105, and details are not described herein again.

Optionally, after step 207, the method further includes:

Upon receiving the extraction request for the amount of information, the terminal returns the aggregated information.

Through the aggregation of information, and receiving the extraction request for the amount of information or the preset keyword, the aggregated information is returned, which improves the accuracy and efficiency of the extracted information.

In the method provided by the embodiment, when the distance between the information amounts is equal, the distance is corrected according to the grammatical structure, and the information amount is aggregated according to the corrected distance, and the aggregation is performed according to the location label. The grammatical structure improves the accuracy of information aggregation and the performance of subsequent information extraction.

Based on the embodiments provided by the present invention, examples are as follows: The text to be aggregated is: "Shanghai tap water comes from the sea".

The amount of information obtained from the above text is as follows: Shanghai, come, water, from, at sea.

Only the aggregation method for the amount of information "water" will be described.

The amount of information before and after the "water" is "self" and "from". The distance between "water" and "self" is the same as the distance between "water" and "from". Therefore, it is impossible to judge the amount of information "water" to be aggregated with that amount of information.

For "water", "self" is a modifier, "from" is a verb; by determining the tightness according to the lexical attribute, it can be known that the closeness of the modifier and the noun is higher than the closeness of the verb to the noun, Therefore, the "tightness" of "self" to "water" is higher than that of "water". Therefore, the corrected distance is: The corrected distance between "water" and "self" is the positive distance minus the positive disturbance, and the corrected distance between "water" and "from" is the original distance. Increase a positive disturbance. Further, selecting a suitable disturbance amount value, such as 0.25, makes the correction distance between the amount of information and the amount of information before and after. This modified distance can describe the tightness between the amount of information. For example, the value of the disturbance momentum is 0.25, and the correction distance between "self" and "water" is: d (self, water) = 3-0.25 = 2.75; d (water, from) = 3+0.25 = 3.25. Thus, the order of aggregation can be judged by the corrected distance. "Water" should be aggregated with "self". The results of the information aggregation are: Shanghai, tap water, from, at sea.

FIG. 3 is a schematic structural diagram of an apparatus for aggregating information according to an embodiment of the present invention. Referring to Figure 3, the device includes:

a text obtaining module 301, configured to acquire text to be aggregated;

a location tag obtaining module 302, configured to acquire a location tag of the information amount in the text; a calculation module 303, configured to calculate a distance between each two information amounts according to the location tag; and a correction module 304, configured to be used by When the distance is equal to the second distance, the first distance and the second distance are corrected according to a grammatical structure, wherein the first distance is a distance between the first information amount and the second information amount in the information amount, The second distance is a distance between the first information amount and the third information amount in the information amount;

The aggregation module 305 is configured to aggregate the information amount according to the corrected first distance and the second distance to obtain a structure.

Optionally, referring to FIG. 4, the apparatus further includes:

a vocabulary identification module 306, configured to acquire a vocabulary attribute of the amount of information in the text;

Correspondingly, The correction module 304 is further configured to: when the first distance and the second distance are equal, correct the vocabulary attribute according to a grammatical structure and the first information amount, the second information amount, and the third information quantity a distance and a second distance;

Or,

The vocabulary identification module 306 is configured to acquire a vocabulary attribute of the information amount in the text, and determine a sentence component of the information amount according to the obtained attribute;

Correspondingly, the correction module 304 is further configured to: when the first distance and the second distance are equal, correct the sentence according to a grammatical structure and sentence components of the first information amount, the second information amount, and the third information amount The first distance and the second distance.

The correction module 304 is specifically configured to: when the first distance and the second distance are equal, acquire the first information amount according to a grammatical structure and vocabulary attributes of the first information amount, the second information amount, and the third information amount The tightness between the second amount of information, the tightness between the first amount of information and the third amount of information, and correcting the first distance and the second distance according to the tightness of the acquisition;

The correction module 304 is further configured to: when the first distance and the second distance are equal, acquire the first information amount according to a grammatical structure and sentence components of the first information amount, the second information amount, and the third information amount The tightness between the second information amount, the first information amount, and the third information amount, and the first distance and the second distance are corrected according to the acquired tightness.

Preferably, the specific content of the location tag includes: a natural paragraph position, a start position, and an end position of the information amount in the text.

The formula used by the calculation module 303 to calculate the distance between each two information amounts is: Distance = |L(x) -

L(y)| , where L(x) and L(y) are the position label value of the information amount X and the position label value of the information amount y, respectively;

The calculation formula of the position label value is: position label value = paragraph position X paragraph maximum word The number of symbols + (starting position + ending position) /2 , where the paragraph position is the natural paragraph position of the amount of information in the text.

In the device provided by this embodiment, when the distance between the information amounts is equal, the distance is corrected according to the grammatical structure, and the information amount is aggregated according to the corrected distance, and the aggregation is performed according to the location label. The grammatical structure improves the accuracy of information aggregation and the performance of subsequent information extraction.

A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the scope of the present invention, should be included in the scope of the present invention. Inside.

Claims

Rights request

A method for aggregating information, the method comprising: obtaining a text to be aggregated;

Obtaining a position label of the amount of information in the text; calculating a distance between each two information amounts according to the position label; and correcting the first distance and the first according to a grammatical structure when the first distance and the second distance are equal a distance between the first information amount and the second information amount in the information amount, where the second distance is the first information quantity and the third quantity in the information quantity a distance between the information amounts; and the information amount is aggregated according to the corrected first distance and the second distance to obtain a structure

2. The method according to claim 1, wherein when the first distance and the second distance are equal, the first distance and the second distance are corrected according to a grammatical structure, and the method includes:

Obtaining a vocabulary attribute of the amount of information in the text; correspondingly, when the first distance and the second distance are equal, correcting the first distance and the second distance according to a grammatical structure, including: when the first distance and the second distance are equal Refining the first distance and the second distance according to a grammatical structure and vocabulary attributes of the first amount of information, the second amount of information, and the third amount of information;

The method according to claim 1, wherein when the first distance and the second distance are equal, the first distance and the second distance are corrected according to a grammatical structure, and the method includes: acquiring the amount of information in the text a vocabulary attribute, and determining a sentence component of the information amount according to the obtained vocabulary attribute; correspondingly, when the first distance and the second distance are equal, correcting the first distance according to a grammatical structure And the second distance, comprising: correcting the first distance and the first distance according to a grammatical structure and a sentence component of the first information amount, the second information amount, and the third information amount when the first distance and the second distance are equal Two distances.

The method according to claim 2, wherein, when the first distance and the second distance are equal, according to a grammatical structure and vocabulary attributes of the first amount of information, the second amount of information, and the third amount of information, Correcting the first distance and the second distance, specifically: when the first distance and the second distance are equal, acquiring the vocabulary attribute according to the grammatical structure and the first information amount, the second information amount, and the third information amount Determining the closeness between the first amount of information and the second amount of information, the tightness between the first amount of information and the third amount of information, and correcting the first distance and the second distance according to the tightness of the acquisition;

5. The method according to claim 3, wherein, when the first distance and the second distance are equal, according to a grammatical structure and sentence components of the first information amount, the second information amount, and the third information amount, Correcting the first distance and the second distance, specifically: when the first distance and the second distance are equal, acquiring the sentence component according to the grammatical structure and the first information amount, the second information amount, and the third information amount The tightness between the first information amount and the second information amount, the tightness between the first information amount and the third information amount, and the first distance and the second distance are corrected according to the tightness of the acquisition.

The method according to any one of claims 1 to 5, wherein the specific content of the location tag comprises: a natural paragraph position, a starting position and an ending position of the information amount in the text.

The method according to any one of claims 1 to 6, characterized in that the calculation formula for calculating the distance between each two information amounts is: distance = |L(x) - L(y)|, wherein , L(x) and L(y) are the position label value of the information quantity X and the position label value of the information quantity y, respectively;

The position label value is calculated as: position label value = paragraph position X paragraph maximum character number + (start position + end position) /2, where the paragraph position is the natural paragraph position of the information amount in the text.

8. An apparatus for aggregating information, the apparatus comprising:

a text acquisition module, configured to acquire text to be aggregated, and a location label acquisition module, configured to acquire a location label of the information amount in the text;

a calculation module, configured to calculate a distance between each two information quantities according to the position label; and a correction module, configured to correct the first distance and the second according to a grammatical structure when the first distance and the second distance are equal a distance, wherein the first distance is a distance between a first information amount and a second information quantity in the information amount, and the second distance is the first information quantity and the third information in the information quantity The aggregation module is configured to aggregate the information amount according to the corrected first distance and the second distance to obtain a structure.

The device according to claim 8, wherein the device further comprises: a vocabulary identification module, configured to acquire a vocabulary attribute of the amount of information in the text;

Correspondingly, the correction module is further configured to: when the first distance and the second distance are equal, according to a grammar Correcting the first distance and the second distance by the structure and the vocabulary attributes of the first amount of information, the second amount of information, and the third amount of information;

The device according to claim 8, wherein the device further comprises:

a vocabulary identification module, configured to acquire a vocabulary attribute of the amount of information in the text, and determine a sentence component of the information amount according to the acquired attribute;

Correspondingly, the correction module is further configured to: when the first distance and the second distance are equal, correct the sentence according to a grammatical structure and sentence components of the first information amount, the second information amount, and the third information amount One distance and two distances.

The device according to claim 9, wherein the correction module is configured to: when the first distance and the second distance are equal, according to a grammatical structure and the first information amount, the second information amount, and the The vocabulary attribute of the three information amounts acquires the closeness between the first information amount and the second information amount, the tightness between the first information amount and the third information amount, and corrects the first according to the tightness of the acquisition Distance and second distance; and/or

The device according to claim 10, wherein the correction module is further configured to: when the first distance and the second distance are equal, according to a grammatical structure and the first information amount, the second information amount, and The sentence component of the third information amount acquires the closeness between the first information amount and the second information amount, the tightness between the first information amount and the third information amount, and corrects the first according to the tightness of the acquisition One distance and two distances.

The device according to any one of claims 8 to 12, wherein the specific content of the location tag comprises: a natural paragraph position, a starting position and an ending position of the amount of information in the text.

14. The apparatus according to any one of claims 8 to 13, wherein the formula for calculating the distance between each two information amounts used by the calculation module is: distance = |L(x) - L(y) | , where L(x) and

L y) is the position label value of the information amount X and the position label value of the information amount y;