WO2013107297A1

WO2013107297A1 - Information aggregation method and device

Info

Publication number: WO2013107297A1
Application number: PCT/CN2013/070051
Authority: WO
Inventors: 刘冰
Original assignee: 华为终端有限公司
Priority date: 2012-01-20
Filing date: 2013-01-05
Publication date: 2013-07-25
Also published as: CN103218371B; CN103218371A

Abstract

The present invention relates to the technical field of information processing. Disclosed are an information aggregation method and device. The method comprises: determining related information about an information amount in a file; calculating the distance between different information amounts according to the related information; and aggregating the different information amounts according to the calculated distance between the different information amounts. The information aggregation accuracy can be increased using the present invention.

Description

The present invention relates to the field of information processing technologies, and in particular, to an information aggregation method and apparatus. Background technique

Information aggregation is the combination of different information with intrinsic links into a structure, such as person name, phone number, email address. If the information belongs to someone's data, then the person name, phone number, and email address can be combined into one. A large block of information, which is a structure: (name, phone number, email address). With information aggregation technology, users can provide one-stop personalized service with multi-source information. For example, the terminal device monitors the user's mail or short message, automatically extracts information of interest, such as contact information, event information, etc. from the mail or short message, and then generates a calendar event, a transaction reminder event, or an address book. Contacts, and then store the information in the corresponding location, such as schedules, transaction reminders, contact lists, etc., to help users process information and improve work efficiency.

Information aggregation is a necessary prerequisite for information extraction. Aggregating information with a quantifiable standard is the core work of information aggregation. Choosing different metrics will affect the effect of information aggregation, which will affect the final result of information extraction.

In the prior art, a common method of information aggregation is to use grammatical structure analysis. Grammatical structure analysis uses grammatical principles to combine information according to different grammatical components. For example, taking Chinese grammar as an example, the sentence components are subject, predicate, object, attributive, adverbial, and complement. Each component has requirements for lexical attributes. For example, a noun can act as a subject, a verb can be used as a predicate, an adjective modifies a noun, and so on. Depending on the different attributes of the vocabulary, the sentence components can be aggregated. However, the complexity of sentences and the diversity of components make grammatical structure analysis difficult to quantify. For example, the principle of proximity in grammar analysis is a very complicated problem for terminal equipment, because there is no quantitative definition in the distance, the terminal design I don't know what is far and what is close. Since grammatical structure analysis is difficult to quantify, the accuracy of information aggregation is low. Summary of the invention

The embodiments of the present invention provide an information aggregation method and apparatus for improving the accuracy of information aggregation.

To this end, the embodiments of the present invention provide the following technical solutions:

An information aggregation method, including:

Determining information about the amount of information in the file;

Calculating a distance between different amounts of information according to the related information;

The different amounts of information are aggregated according to the calculated distance between different amounts of information.

An information aggregation device includes:

An information determining unit, configured to determine related information of the information amount in the file;

a calculating unit, configured to calculate a distance between different information amounts according to the related information; and an aggregation unit, configured to aggregate different information amounts according to the calculated distance between different information amounts.

The information aggregation method and apparatus provided by the embodiments of the present invention determine the distance between different information amounts in the file by determining related information of the information amount in the file, and calculating the distance between different information amounts according to the related information. The quantization process is performed, and the quantized distance is used to aggregate different amounts of information, thereby effectively improving the accuracy of information aggregation. DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings may be obtained based on these drawings without paying for creative labor.

1 is a flowchart of an information aggregation method according to an embodiment of the present invention;

2 is a schematic structural diagram of an information aggregation apparatus according to an embodiment of the present invention; 3 is a schematic structural view of a polymerization unit in an embodiment of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The information aggregation method and device according to the embodiment of the present invention quantifies the distance between different information amounts in the file by determining relevant information of the information amount in the file, and calculating a distance between different information amounts according to the related information. Processing, using the quantized distance to aggregate different amounts of information, effectively improving the accuracy of information aggregation.

The information aggregation method in the embodiment of the present invention can be applied to a terminal device or a server. For example, the terminal device monitors the user's mail or short message, and automatically aggregates the information that the user pays attention to.

As shown in FIG. 1 , it is a flowchart of an information aggregation method according to an embodiment of the present invention, which includes the following steps: Step 101: Determine related information of an information volume in a file.

The amount of information refers to information that the user pays attention to, for example, a person name, a phone number, an email address, a conference theme, a meeting place, a meeting content, and the like. Each amount of information consists of one or more strings, each of which has its associated information. In this embodiment, the step may be to determine related information of different amounts of information in the file. It can also be understood that the related information corresponding to the information that the user pays attention to in the file is obtained, or the related information corresponding to the amount of information in the file is obtained.

The file may be a mail or a short message of the user, and may be other files, which are not limited in this embodiment of the present invention. In this embodiment, the file may be the mail or the short message of the user currently received by the terminal device, or may be the mail or the short message of the user that has been stored on the terminal device, which is not limited in the embodiment of the present invention.

In practical applications, you can use the sentence segmentation technique to first divide the continuous string in each sentence in the file into different words, and then determine whether each of the words is the information that needs attention. the amount. For example, it is possible to predefine categories of information that need to be focused on, classify the segmented word segments, and then determine whether it is the amount of information to be concerned according to the category of each word. In addition, other methods can be used to identify the amount of information in the file. For example, some vocabularies that need attention can be set, and then the contents of the file are filtered according to the vocabulary to find out the amount of information that needs attention.

Of course, there are many other ways to identify the amount of information in the file, which is not limited in this embodiment of the present invention.

In the embodiment of the present invention, the related information may be location information, for example, a paragraph position, a starting position, and an ending position of the information amount in the file. The position of the paragraph indicates a position of the natural paragraph in the file, which is a constant; the start position and the end position indicate the position of the information amount in the sentence in the file.

The amount of information is in the first paragraph of the document, and the position of the paragraph is 1 . If it is in the second paragraph, the position of the paragraph is 2, and so on.

For example, the document has the following contents: "Xiao Ming went to Beijing for a business trip today, his phone number is 12345678. ,,

The amount of information that needs attention is: Xiao Ming, He, Phone, and 12345678.

Suppose the above content is in the nth paragraph in the file. Each Chinese character occupies two locations, the number occupies one location space, and the starting position is 1. Then the amount of information in the relevant information in the file:

Xiao Ming (n, 1 , 4);

He (n, 21, 22);

Telephone (n, 25, 28);

12345678 (n, 31, 38).

Of course, the related information may also include other information, such as information such as a grammatical attribute of the amount of information.

Step 102: Calculate a distance between different information amounts according to the related information. Specifically, the tag value of the information amount may be calculated according to the related information of the information amount to obtain the tag value corresponding to the different information amount. In this embodiment, it can be understood that, since each information quantity has a corresponding related information, the label value is calculated, so that each information quantity can obtain its corresponding label value, and then calculated according to the calculated label value. The distance between different amounts of information.

In the following, the related information includes only the location information, and the content in the above file is taken as an example for description.

For example, you can define a label value that calculates the amount of information by the following formula (1):

L = paragraph position * label coefficient + (start position + end position) /2 ( 1 ) where L is the label value of the information amount.

The label coefficient is added to the above formula (1) in order to ensure the uniqueness of the calculated label value of each information amount. In practical applications, the label factor can be the maximum number of characters in the paragraph containing the most number of characters in all paragraphs in the file. For convenience of description, the label coefficient is recorded as max-size. For example, there are three natural paragraphs in the file. The number of characters in the first paragraph is n1, the number of characters in the second paragraph is n2, and the number of characters in the third paragraph is n3, then max_size = max(n1, n2, n3).

If the value of max_size is not the maximum number of characters in the paragraph containing the most characters in all paragraphs in the file, but other values, such as taking the character value of the current paragraph, the uniqueness of the label cannot be guaranteed.

For example, there are three paragraphs of text, the first paragraph has 1000 characters, the second paragraph has 500 characters, and the third paragraph has 600 characters. If the value of max_size is the value of the current paragraph, the following will occur: The label value of the information quantity in the first paragraph is: 1 *1000 + (start position + end position) /2, the value of this label value The range is (1000, 2000), the starting position of this segment is 1, and the ending position is 1000, and their intermediate value range is (1, 1000);

The value of the tag in the second paragraph is: 2*500 + (start position + end position) /2, the value range of this tag is (1000, 1500), the starting position of this segment is 1, and the end position is 500 , their median range is (1, 500); In the same way, the range of tag values for the amount of information in the third segment is (1800, 2400). It can be seen that the range of the label value of the information quantity in the first paragraph covers the range of the label value of the information quantity in the second paragraph, and the label value of the information quantity in the first paragraph and the label value of the information quantity in the third paragraph overlap.

Of course, the above label coefficient may also be a number greater than the maximum value of the number of characters in the paragraph containing the largest number of characters in all the paragraphs in the file.

According to the above formula (1), the label values of the above information amounts can be obtained as follows:

L (小明) = n*max一 size + 5/2;

L (he) = n*max a size + 43/2;

L_ (telephone) = n*max_size + 53/2;

L(12345678) = n*max—size + 59/2.

In this embodiment, when the distance between different amounts of information is to be calculated, it can be understood as: Calculating the distance between any two different amounts of information in the file. In this embodiment, the absolute value of the difference between the tag values corresponding to two different information amounts may be taken as the distance between the two information amounts, that is, the distance between different information amounts is calculated according to the following formula (2):

d(x, y) = |L(x) - L(y)| ( 2 )

Where x and y represent two different amounts of information.

According to the above formula (2), the distance between the above information amounts can be obtained as follows: d (Xiao Ming, he) = 19;

d (小明, phone) =24;

d (小明, 12345678)=27;

d (he, phone) = 5;

d (he, 12345678)=8;

d (telephone, 12345678) = 3.

In the present embodiment, a plurality of distances can be obtained by the above calculation. It can also be understood as: Through the above calculation process, the distance between different information amounts in the file can be quantized, so that the terminal device can accurately identify the distance between different information amounts, thereby providing a quasi-information for information aggregation. The exact basis for the reference.

In step 103, different amounts of information are aggregated according to the calculated distance between different amounts of information.

In the process of polymerization, it is necessary to consider the distance between different amounts of information and to carry out the aggregation according to the principle of proximity. The information that needs to be aggregated can be different categories and related information, usually information such as person name, phone number, address, and mailbox, and can also be aggregated according to the information category defined by the user.

Since there are reference relationships (such as "he" and "xiaoming") and/or peer relationships (such as "telephone" and "12345678") between different amounts of information, they can be based on referential relationships and/or The relationship corrects the distance between the related information quantities, and then selects a minimum distance to aggregate the amount of information corresponding to the distance.

For example, in the distance obtained above, d (Xiaoming, 12345678) = 27, because "Xiao Ming" and "He" have a referential relationship, while "12345678" and "Telephone" have a peer relationship, and d (he, telephone) = 5, therefore, d (Xiaoming, 12345678) can be corrected to 5, which is the same value as d (he, phone). In this way, compare the calculated distance between "Xiaoming" and "12345678" with the calculated distance of "12345678" from other people's names, and select a minimum value to aggregate, that is, the person whose phone number "12345678" is the shortest distance. Perform polymerization.

The above-mentioned referential relationship and the judgment of the peer relationship can be determined according to the grammatical attribute and the distance relationship of each information amount. In this embodiment, it can be understood that the referential relationship or the peer relationship between different information amounts can be determined according to the syntax attribute of each information amount, and further, according to the syntax attribute and the distance relationship of each information amount, further The judgment of a referential relationship or a peer relationship between different amounts of information. For example, "telephone" and "12345678" are connected by the conjunction "yes" to determine that they are peer-to-peer. For another example, "Xiaoming" is a person's name, "He" is a pronoun, and there are no other pronouns in the above text, so it can be determined that they have a referential relationship. Of course, if there are other pronouns in the text, it is necessary to determine the distance between the nearest pronoun and "Xiao Ming" according to the distance between each pronoun and "Xiao Ming". On the other hand, if there are other names in the text, you also need to determine the distance between each person's name and the pronoun "he" to determine the closest person's name and pronoun "he". Has a referential relationship. For the case of having multiple names and multiple pronouns, it is also possible to determine the relationship between the person name and the pronoun in the above manner.

Of course, if there is no referential relationship and a peer relationship between different amounts of information, it is not necessary to correct the distance between the calculated different amounts of information, but directly to the distance between the calculated different amounts of information. The amount of information corresponding to the minimum distance is aggregated.

In the example above, different amounts of information appear in a paragraph in the file. The following is a further example of the process of information aggregation in the case where the amount of information is in different paragraphs.

For example, a file has the following contents:

President Wang is on a business trip to Beijing tomorrow. His phone number is 12345678.

President Wang will meet with General Zhang. During the meeting, it is not convenient to answer the phone. If there is an urgent matter, he can find Mr. Wang’s secretary, Xiao Wang. His telephone number is 87654321, or send an email directly to Mr. Wang or directly send an email to Wang Zong. The address of the cow is: abc@domain.com.

For the above text, the amount of information that the user needs to pay attention to is the person's name, phone number, and email address.

The above text has two paragraphs, and three people appear, namely Wang Zong, Zhang Zong, Xiao Wang. Among them, Mr. Wang appeared in both paragraphs and appeared three times in the second paragraph;

There are three "he", namely: one "he" in the first paragraph and two "he" in the second paragraph; two telephone numbers, namely: 12345678, 87654321;

An electronic by the cattle address, abc@domain.com.

4 汉 Chinese characters occupy two character positions, Chinese punctuation takes up two character positions, and ASCII characters occupy one character position.

For the above text content, first determine the information about the amount of information in the file, as follows: The amount of information in the first paragraph is:

President Wang, his phone, 12345678;

The amount of information in the second paragraph is:

Wang Zong, Zhang Zong, Wang Zong (second), Xiao Wang, he (the first one), telephone, 87654321, Wang Total (third), he (the second one), email address, abc@domain.com.

In the latest text, the first paragraph has 40 characters and the second paragraph has 146 characters.

Set max one size = 134.

Since there are four "Wang" and three "He" in the amount of information, in order to distinguish the amount of repeated information, the following mark is used: Paragraph value - information amount - the first, such as the third in the second paragraph The total number of kings that appeared was: 2—Wang Zong—3, and so on.

The information about the above information in the file is:

1—Wang Zong—1(1, 1,4);

1一他-1 (1, 21, 22);

1—telephone — 1 (1, 25, 28);

12345678 (1, 31, 38);

2—Wang Zong—1(2, 1, 4);

Zhang Zong (2, 11, 14);

2—telephone — 1 (2, 39, 42);

2—Wang Zong—2 (2, 55, 58);

Xiao Wang (2, 65, 68);

2 one he—1 (2, 71, 72);

2—telephone — 2 (2, 75, 78);

87654321 (2, 81, 88);

2—Wang Zong—3 (2, 97, 100);

2 one he-2 (2,113,114);

Email address (2, 121, 124);

Abc@domain.com (2, 129, 132).

Then, according to the distance calculation formula defined above, the distance between the two information amounts is calculated. The specific calculation process is similar to the previous example, and is not described here.

After obtaining the distance between the amounts of information, the referential relationship and the peer relationship between the amounts of information are determined. (1) Determine the referential relationship of the pronoun "he".

d(1—王总— 1, 1—他—1) = |(1 +4)/2 - (21 +22)/2| = 19;

d(1一他一一1, 2一王总-1(2, 1 , 4)) = |[134 + (21 +22)/2] -[2*134 + (1 +4)/2]| =

1 15.

In the above manner, the distance between this pronoun and other names is also calculated.

This gives you 6 distances (because there are six names, the number of repetitions is counted). Based on the minimum of the six distances, it can be determined that the "he" in the first paragraph refers to the "king of the king" in the first paragraph, that is, "he" and "king" have a referential relationship.

According to the above method, it can also be determined that the first "he" in the second paragraph refers to "small king", and the second "he" refers to "king". In this way, the referential relationship of the personal pronouns in the above text can be determined.

(2) Determine the peer relationship between the phone and the number.

Such as Ί - phone - Γ and Ί 2345678", "2 - phone - 2" and "8765432 Γ, "mail address" and "abc@domain.com".

Using the above identified relationship and peer relationship, it can be determined that "he" in the first paragraph refers to "king" and "telephone" is "12345678". Correcting the distance between the calculated amounts of information, you can get: d (1 - Wang total - 1, 12345678) = d (1 - he - 1, 1 - telephone - 1) = 5.

Then, calculate Ί 2345678" from other names, and choose a minimum value among these distances to determine the attribution of this phone number.

After determining the referential relationship and the peer relationship, the person name, the phone number, and the email address in the relevant information amount with the smallest distance are selected for aggregation, and finally the following aggregation result is obtained:

Mr. Wang, 12345678, abc@domain.com;

Xiao Wang, 87654321;

boss Zhang.

In an actual application, after the terminal device obtains the foregoing aggregation result, the aggregation result may be saved in a corresponding file, and/or displayed to the user for selection by the user.

It can be seen that the information aggregation method in the embodiment of the present invention determines the related information in the file by determining the amount of information. Interest, and calculate the distance between different information amounts according to the related information, so that the distance between different information amounts has a specific value, thereby quantizing the distance between different information amounts in the file, using quantization The latter distance aggregates different amounts of information, which not only can automatically realize the aggregation of information by using terminal equipment, but also can effectively improve the accuracy of information aggregation, and thus provide an accurate information source for information extraction processing. At the same time, due to effective improvement The accuracy of information aggregation, in turn, can provide more accurate services for information that users need to pay attention to, thereby improving the user experience.

Correspondingly, an embodiment of the present invention further provides an information aggregation apparatus, which may be part of a device such as a terminal device or a server. The terminal device may be an intelligent terminal device such as a mobile phone, a PDA, or a tablet computer.

As shown in Figure 2, it is a schematic structural view of the device.

In this embodiment, the apparatus includes:

The information determining unit 201 is configured to determine related information of the amount of information in the file. In this embodiment, the information volume refers to information that the user pays attention to, for example, may be a person name, a phone number, an email address, or a conference topic, a meeting place, a meeting content, and the like. Each amount of information consists of one or more strings, each of which has its associated information. In the present embodiment, the information determining unit 201 may be related information that determines different amounts of information in the file.

The calculating unit 202 is configured to calculate a distance between different information amounts according to the related information. In this embodiment, the calculation unit 202 may first calculate the label value of the information amount according to the information about the information amount. In this embodiment, it may be understood that the calculation unit 202 calculates the label value, so that each information amount can be Obtain the corresponding label value, and then calculate the distance between different information quantities based on the calculated label value. The aggregating unit 203 is configured to aggregate different amounts of information according to the calculated distance between different amounts of information. In this embodiment, in the aggregation process, it is necessary to consider the distance between different amounts of information, and perform aggregation according to the principle of proximity. The information that needs to be aggregated may be different categories and related information, usually information such as person name, phone number, address, and mailbox, or may be aggregated according to the information category defined by the user.

The above-mentioned referential relationship and the judgment of the peer relationship can be determined according to the grammatical attributes and distance relationships of the respective information amounts. In this embodiment, it can be understood that the referential relationship or the peer relationship between different information amounts can be determined according to the syntax attribute of each information amount, and further, according to the syntax attribute and the distance relationship of each information amount, further The judgment of a referential relationship or a peer relationship between different amounts of information.

In the embodiment of the present invention, the sentence segmentation technique can be used to first divide the continuous character string in each sentence in the file into different words, and then determine whether each of the words is the amount of information to be concerned. For example, it is possible to predefine categories of information that need to be focused on, classify the segmented word segments, and then determine whether it is the amount of information to be concerned according to the category of each word. In addition, other ways can be used to identify the amount of information in the file. For example, you can set some vocabulary that needs attention, and then filter the contents of the file according to these vocabularies to find out the amount of information that needs to be paid attention to.

The information determining unit 201 can determine the relevant information in the file only for the amount of information that needs attention.

The related information may be location information, such as a paragraph position, a start position, and an end position. The paragraph position represents a natural paragraph position of the information amount in the file; the start position and the end position indicate a position of the information amount in a sentence in the file. Of course, the related information may also include other information, such as information such as a grammatical attribute of the amount of information.

In the embodiment of the present invention, a specific structure of the calculating unit 202 includes: a first calculating subunit and a second calculating subunit (not shown). among them:

The first calculating subunit is configured to calculate a tag value of the information amount according to the related information, and specifically, calculate a tag value of each information amount according to the above formula (1). In this embodiment, it can be understood that since each information amount has a corresponding related information, by calculating the label value, each information amount can obtain its corresponding label value.

The second calculating subunit is configured to calculate a distance between different information amounts according to the label value. In this embodiment, when the distance between different amounts of information is to be calculated, it can be understood as: Calculating the distance between any two different amounts of information in the file. In this embodiment, the absolute value of the difference between the tag values corresponding to two different information amounts may be taken as the distance between the two information amounts, that is, the distance between different information amounts is calculated according to the above formula (2). .

For the detailed calculation process of the label value of the above information amount and the distance between the different information amounts, refer to the description in the information aggregation method of the foregoing embodiment of the present invention, and details are not described herein again.

FIG. 3 is a schematic diagram showing a specific structure of the polymerization unit in the embodiment of the present invention. In this embodiment, the aggregating unit includes:

The relationship determining subunit 301 is configured to determine whether there is a referential relationship and/or a peer relationship between different amounts of information;

The modifying sub-unit 302 is configured to, when the relationship determining sub-unit 301 determines that there is a referential relationship and/or a peer-to-peer relationship between different amounts of information, determine the referential relationship and/or peer-to-peer determined by the sub-unit according to the relationship. The relationship is corrected for the distance between different amounts of information calculated by the computing unit;

The merging sub-unit 303 is configured to: when the relationship determining sub-unit 301 determines that there is a referential relationship and/or a peer relationship between different information amounts, the information corresponding to the minimum distance of the corrected sub-units 302 The amount is polymerized. In this embodiment, the merging sub-unit 303 is further configured to use the different information amount calculated by the calculating unit when the relationship determining sub-unit 301 determines that there is no referential relationship and/or a peer relationship between different information amounts. The amount of information corresponding to the minimum distance among the distances is aggregated.

The determination of the referential relationship and the peer relationship by the relationship determining subunit 301 can be determined based on the grammatical attributes and distance relationships of the respective information amounts. In this embodiment, it can be understood that, according to the grammatical attributes of the respective information amounts, the referential relationship and/or the peer relationship between different information amounts can be determined. Further, according to the grammatical attributes and the distance relationship of each information amount, To further judge the referential relationship and/or the peer relationship between different amounts of information. For details, refer to the description in the foregoing embodiments of the present invention, and details are not described herein again.

For the specific processing of the foregoing modification sub-unit 302 and the merging sub-unit 303, reference may be made to the description in the foregoing embodiments of the present invention, and details are not described herein.

The information aggregation apparatus of the embodiment of the present invention quantizes the distance between different information amounts in the file by determining related information of the information amount in the file, and calculating a distance between different information amounts according to the related information, The quantized distance is used to aggregate different amounts of information, which effectively improves the accuracy of information aggregation, and thus provides an accurate information source for information extraction processing. At the same time, because the accuracy of information aggregation is effectively improved, The information that the user needs to pay attention to provides a more accurate service, thereby improving the user experience.

It should be noted that the information aggregation method and device in the embodiment of the present invention can be applied to a terminal device or a device such as a server, and can not only realize aggregation of text information, but also implement aggregation of image information.

The various embodiments in the present specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical singles. The element can be located in one place, or it can be distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.

A person skilled in the art can understand that all or part of the process of implementing the foregoing embodiment apparatus can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. In execution, the flow of an embodiment of the various devices as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any change or replacement that can be easily conceived by those skilled in the art within the technical scope of the present invention is All should be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

Claim

An information aggregation method, comprising:

Determining information about the amount of information in the file;

2. The method according to claim 1, wherein the amount of information is information of interest to the user.

3. The method according to claim 1, wherein the determining the information related information in the file comprises:

Determining location information of the information amount in the file, the location information includes: a paragraph position, a starting position, an ending position, wherein the paragraph position indicates a natural paragraph position of the information amount in the file, the starting position And the end position indicates the position of the amount of information in the sentence in the file.

The method according to claim 3, wherein the calculating the distance between different information amounts according to the related information comprises:

Calculating a tag value of the information amount according to the related information, to obtain a tag value corresponding to the different information amount;

The distance between different amounts of information is calculated based on the tag value.

5. The method of claim 4, wherein

The calculating the tag value of the information amount according to the related information includes:

The label value of the information amount is calculated by the following formula: L = paragraph position * label coefficient + (start position + end position 2;

The calculating the distance between different information amounts according to the label value includes:

The absolute value of the difference of the tag values corresponding to the different amounts of information is taken as the distance between the different amounts of information.

6. The method according to claim 5, wherein the label coefficient is greater than or equal to The maximum number of characters in the paragraph that contains the most characters in all paragraphs in the file.

The method according to any one of claims 1 to 6, wherein the aggregating different amounts of information according to the calculated distance between different amounts of information comprises:

Determine whether there is a referential relationship and/or a peer relationship between different amounts of information;

When it is determined that there is a referential relationship and/or a peer relationship between different amounts of information, the distance is corrected according to the referential relationship and/or the peer relationship;

The amount of information corresponding to the minimum distance among the corrected distances is aggregated.

8. The method according to claim 7, wherein the aggregating different amounts of information according to the calculated distance between different information amounts further comprises:

When it is determined that there is no referential relationship and/or a peer relationship between different amounts of information, the amount of information corresponding to the minimum distance among the distances of the calculated different amounts of information is aggregated.

9. The method according to claim 8, wherein the determining whether there is a referential relationship and/or a peer relationship between different amounts of information comprises:

The referential relationship and/or the peer relationship between different amounts of information are judged according to the grammatical attributes of the respective information amounts.

10. The method according to claim 8, wherein the determining whether there is a referential relationship and/or a peer relationship between different amounts of information further comprises:

The referential relationship and/or the peer relationship between different amounts of information are judged based on the grammatical attribute and the distance relationship of each information amount.

1 1. An information aggregation device, comprising:

a calculating unit, configured to calculate a distance between different information amounts according to the related information;

An aggregation unit is configured to aggregate different amounts of information according to distances between different amounts of information calculated.

12. Apparatus according to claim 1 1 , wherein

The information determining unit is specifically configured to determine location information of the information amount in the file, where the location information includes: a paragraph position, a starting position, and an ending position; the paragraph position indicates that the information amount is The natural paragraph position in the file; the start position and the end position indicate the position of the information amount in the sentence in the file, and the information amount is information of interest to the user.

The device according to claim 12, wherein the calculating unit comprises: a first calculating subunit, configured to calculate a tag value of the information amount according to the related information, to obtain a corresponding amount of information Label value

a second calculating subunit, configured to calculate a distance between different amounts of information according to the tag value.

14. Apparatus according to claim 12 wherein:

The first calculating subunit is specifically configured to calculate a label value of the information amount by using the following formula: L = paragraph position * label coefficient + (starting position + ending position) /2;

The second calculating sub-unit is specifically configured to use the absolute value of the difference of the tag values corresponding to different information amounts as the distance between the different information amounts.

The device according to any one of claims 1 to 14, wherein the aggregation unit specifically comprises:

a relationship determining subunit for determining whether there is a referential relationship and/or a peer relationship between different amounts of information; a modifying subunit for determining a reference relationship between the different information amounts when the relationship determining subunit 301 determines And/or a peer relationship, determining, according to the relationship, a reference relationship determined by the subunit and/or a peer relationship to correct a distance between different amounts of information calculated by the computing unit;

And a merging sub-unit, configured to: when the relationship determining sub-unit determines that there is a referential relationship and/or a peer-to-peer relationship between different amounts of information, the information amount corresponding to the minimum distance of the modified sub-units is aggregated .

The apparatus according to claim 15, wherein the merging subunit is further configured to: when the relationship determining subunit determines that there is no referential relationship and/or a peer relationship between different amounts of information, The amount of information corresponding to the minimum distance among the distances between the different amounts of information calculated by the calculation unit is aggregated.

The device according to claim 15, wherein the relationship determining sub-unit is further configured to determine a referential relationship and/or a peer-to-peer relationship between different amounts of information according to syntax attributes of each information amount. Department.

The device according to claim 15, wherein the relationship determining subunit is further configured to determine a referential relationship and/or a peer relationship between different amounts of information according to a grammatical attribute and a distance relationship of each information amount. .