CN107506374B - Mailbox author corresponding method and device and computer readable storage medium - Google Patents

Mailbox author corresponding method and device and computer readable storage medium Download PDF

Info

Publication number
CN107506374B
CN107506374B CN201710574481.0A CN201710574481A CN107506374B CN 107506374 B CN107506374 B CN 107506374B CN 201710574481 A CN201710574481 A CN 201710574481A CN 107506374 B CN107506374 B CN 107506374B
Authority
CN
China
Prior art keywords
mailbox
author
occurrence
collaborators
works
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710574481.0A
Other languages
Chinese (zh)
Other versions
CN107506374A (en
Inventor
霍东云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Saishi Technology Co Ltd
Original Assignee
Beijing Saishi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Saishi Technology Co Ltd filed Critical Beijing Saishi Technology Co Ltd
Priority to CN201710574481.0A priority Critical patent/CN107506374B/en
Publication of CN107506374A publication Critical patent/CN107506374A/en
Application granted granted Critical
Publication of CN107506374B publication Critical patent/CN107506374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for corresponding mailbox authors and a computer readable storage medium, wherein the method for corresponding mailbox authors comprises the following steps: a mailbox searching step, namely searching a mailbox in a file containing the works of the known author; a step of counting the co-occurrence quantity of the author and the mailbox, wherein the number of the files in which the author and the mailbox co-occur is known in the files containing the works of the known author, namely the co-occurrence quantity of the author and the mailbox; acquiring collaborators, namely acquiring collaborators of known authors aiming at works contained in a file where the known authors and the mailbox appear together; a step of counting the co-occurrence quantity of the collaborators and the mailboxes, wherein the number of the co-occurrences of the collaborators and the mailboxes in the files containing the works of the known authors is counted, namely the co-occurrence quantity of the collaborators and the mailboxes; and a mailbox association step of associating the mailbox with the known author when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox is greater than a preset threshold value.

Description

Mailbox author corresponding method and device and computer readable storage medium
Technical Field
The invention relates to information retrieval, in particular to author information retrieval.
Background
When an article is retrieved, contact with the author may be required, which may require mail, telephone, etc. Some articles may list some telephones and mailboxes, but usually do not indicate which author's telephone and mailbox or the mailbox of an editing department, and if a lot of articles are retrieved, for example, when it is necessary to determine that the mailboxes of the authors are used by a third party, it is more necessary to judge the corresponding relationship between the authors and the mailboxes more accurately, so as to improve the corresponding accuracy. The current state of the art has not indicated this need, nor has it provided a corresponding solution.
Disclosure of Invention
The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a solution that alleviates or eliminates one or more of the disadvantages of the prior art, and at least provides a useful alternative.
In order to achieve the above object, according to an aspect of the present invention, there is disclosed a mailbox author correspondence method, including: a mailbox searching step, namely searching a mailbox in a file containing the works of the known author; an author and mailbox co-occurrence amount counting step, counting the number of files which co-occur between the known author and the mailbox, namely the author and mailbox co-occurrence amount, in the files containing the works of the known author; acquiring collaborators of the known author aiming at the works contained in the file which commonly appears between the known author and the mailbox; a step of counting the co-occurrence quantity of the collaborators and the mailbox, wherein the number of the co-occurrences of the collaborators and the mailbox in the file containing the works of the known collaborators, namely the co-occurrence quantity of the collaborators and the mailbox, is counted; and a mailbox association step of associating the mailbox with the known author when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox is larger than a preset threshold value.
According to one embodiment, the method further comprises: a step of counting the co-occurrence rate of the author and the mailbox, wherein the ratio of the files in which the known author and the mailbox co-occur in the files containing the works of the known author, namely the co-occurrence rate of the author and the mailbox, is counted; a step of counting the co-occurrence rate of the collaborators and the mailboxes, wherein the ratio of the files which co-occur of the collaborators and the mailboxes in the files containing the works of the collaborators, namely the co-occurrence rate of the collaborators and the mailboxes, is counted; when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of the collaborators and the mailbox is smaller than a preset threshold value, if the difference between the co-occurrence rate of the author and the mailbox and the co-occurrence rate of each collaborator and mailbox is larger than a preset threshold value, the mailbox association step associates the mailbox with the known author.
According to one embodiment, the mailbox association step does not associate the mailbox with the known author when the author and mailbox co-occurrence rate is below a predetermined threshold.
According to one embodiment, in the partner and mailbox co-occurrence rate statistics step, a file containing a collaborative work of both a known author and the partner is removed from files containing works of respective partners.
According to another aspect of the present invention, there is provided a mailbox author correspondence apparatus, including: a mailbox searching unit which searches a mailbox in a file containing the works of the known author; an author and mailbox co-occurrence amount counting unit which counts the number of files in which a known author and a mailbox co-occur in a file containing a work of the known author, namely the author and mailbox co-occurrence amount; the partner acquiring unit is used for acquiring each partner of the known author aiming at the works contained in the file which commonly appears between the known author and the mailbox; the device comprises a partner and mailbox co-occurrence amount counting unit, a mailbox co-occurrence amount counting unit and a judging unit, wherein the partner and mailbox co-occurrence amount is counted in a file containing works of known authors; and the mailbox associating unit associates the mailbox with the known author when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox is greater than a preset threshold value. .
According to yet another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the steps of the method of the above aspects.
According to the invention, the author and the mailbox can be easily and accurately associated.
Drawings
The invention may be better understood with reference to the following drawings. The drawings are exemplary only, and are not intended as limitations on the scope of the invention.
FIG. 1 shows a schematic flow chart of an author and mailbox correspondence method according to one embodiment of the present invention;
fig. 2 shows a schematic block diagram of an author and mailbox correspondence apparatus according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings, but the present invention is not limited thereto.
Fig. 1 shows a schematic flow chart of an author and mailbox correspondence method according to an embodiment of the present invention. As shown in FIG. 1, according to one embodiment of the present invention, a mailbox is first located in a file found to include works of known authors, step 101. The files including the works of the known authors are, for example, word files, web pages, PDF files, etc., and these files include some information such as captions, footnotes, etc. in addition to the text of the works of the known authors, and these information usually include information of the authors, such as names, work units, brief descriptions, etc., and of course, may also include information of their contact ways, such as mailboxes.
The known author may be an author known in advance to obtain a mailbox thereof, or an author obtained in a retrieval result obtained by retrieving a specific theme file or the like, and now needs to obtain a mailbox thereof.
There are various methods for extracting mailbox addresses from files, for example, for a word file, a search function provided by the word itself may be used, for a txt file, Easy Email extra software, a FileEmail extra for a PDF file, etc. may be used, and some simple JAVA programs may be provided to implement this function. Persons skilled in the art who have the benefit of this disclosure may use any now known or future known method to extract mailbox addresses from a file, and will not be described in detail herein. When a plurality of mailboxes are found, the subsequent steps can be carried out one by one, or the subsequent steps can be carried out every time one mailbox is found.
Next, in step 102, the number of files in which the known author and the mailbox co-occur, that is, the amount of co-occurrence of the author and the mailbox, is counted in the files including the works of the known author.
Then, in step 103, the collaborators of the known author in the file where the known author and the mailbox co-occur are obtained. For example, assuming that a known author is Zhao big and has 100 files containing his work, in case of looking for his mailbox, the mailbox finish 2017@163.com was found in step 101 and a total of 95 files containing the author Zhao big work were found containing this mailbox. Of these 95 pieces, there may be 50 pieces that are done individually by Zhao and 45 pieces that are done in cooperation with a human. The collaborators (collaborators) are searched for file by file, for example, the collaborators who collaborate with Zhao Da, such as Qian's two, Zhang's three, Li's four, Zhou Wu's five, Wu Lu, Zheng Qi, Wang Jiu, Cheng Shi, etc. (purely for the purpose of facilitating understanding of the present invention, no actual collaborating work is suggested or explained), etc., can be formed into a list or stored in a database, and are operated one by one for the collaborators in subsequent operations.
Then, in step 104, the number of co-occurrences of each collaborator and the mailbox in the file containing the works of the known collaborators, namely the co-occurrence amount of the collaborators and the mailbox, is counted. That is, the number of co-occurrences of each collaborator with the mailbox is found in 100 documents containing known works of the author Zhao and Da. For example, the possible result is that qian di and the mailbox finish 2017@163.com co-occur 10 times, 4 times for zhang, 6 times for lie, 3 times for friday, 7 times for wu luo, 5 times for zheng qi, 5 times for wangjiu, and 5 times for cheng shi.
The difference between the author and mailbox co-occurrence and the collaborator and mailbox co-occurrence is then determined at step 105. Assuming that the threshold is 80% of the co-occurrence of the author and mailbox, i.e. 95 x 80% — 76 times, the difference between the co-occurrence of the author and mailbox and the co-occurrence of the collaborator and mailbox is greater than the threshold for each collaborator, and thus in step 106, mailbox fineart2017@163.com is associated with the known author zhao. In one embodiment, the threshold may be set to 1, i.e., the mailbox and the known author may be associated as long as all other authors have a lower co-occurrence with the mailbox than the collaborator. If the difference is less than zero, i.e., a number of co-occurrences of a collaborator with the mailbox is greater than a number of co-occurrences of the known author with the mailbox, the mailbox may be deemed not to be associated with the known author and operations for the next mailbox may begin.
On the other hand, according to an embodiment of the present invention, if in step 104, the number of co-occurrences of a partner with the mailbox finish 2017@163.com is large, for example, the number of co-occurrences of two and the mailbox finish 2017@163.com is 80, then in step 105, it is determined that the difference between the co-occurrence amount of the partner and the mailbox and the co-occurrence amount of the partner and the mailbox is less than the threshold, which indicates that the occurrence of the mailbox may be due to the co-occurrence of two and may be influenced by other factors, and although the number is still less than 95 times of co-occurrence amount of the partner and mailbox, the mailbox should not be directly associated with the known partner. At this time, in step 107, the co-occurrence rate of the author and the mailbox is counted, that is, the ratio of the file co-occurring between the known author and the mailbox in the file containing the work of the known author, that is, the co-occurrence rate of the author and the mailbox is counted. For example, in the above example, the file containing the known work by the author Zhao is 100 pieces, of which 95 pieces appear the mailbox finish 2017@163.com, with an occurrence rate of 95%. Then, in step 108, the co-occurrence rate of each collaborator and the mailbox is counted, that is, the ratio of the file co-occurring between each collaborator and the mailbox in the file containing the work of each collaborator, that is, the co-occurrence rate of the collaborator and the mailbox is counted. In this step, the file to be retrieved is not limited to the file containing the work of the known author, but only the file containing the work of each collaborator. In this step, according to one embodiment, the file including the known author and the collaborative works of the collaborators may be deleted from the file containing the works of the collaborators. For example, when the co-occurrence rate of the second money and the mailbox finish 2017@163.com is counted, the file containing the cooperative work of the second money and the Zhao-da is excluded. For example, the statistical results may be two 1% of money, three 0% of Zhang, etc. The difference between the author and mailbox co-occurrence and the collaborator and mailbox co-occurrence is then calculated at step 109. If the author and mailbox co-occurrence rates differ significantly (above a predetermined threshold) from each of the collaborators and mailboxes, then an author is associated with a mailbox in step 106.
On the other hand, according to an embodiment of the present invention, if there is a case where the difference is not significant, for example, the case where the co-occurrence rate of lie four and the mailbox reaches 94%, and the difference between the occurrence rates is lower than a threshold (for example, 15%), it indicates that the mailbox may be affected by some factors, such as a reference to a file, and the like, and it is not desirable to directly associate the mailbox with the known author. At this time, in step 110, a distance between the mailbox address and the known author and a distance between the mailbox address and the collaborator in a file including the collaborative works of the known author and the collaborator are calculated. In the above example, that is, in a file of a collaborative work including known author Zhao Da and associate Li four, the distance of the mailbox address fineart2017@163.com from the Zhao Da name and the distance of the mailbox address fineart2017@163.com from the name of Li four are determined. There are various methods for calculating such a distance, such as a file layout method, a layout block of a file is obtained, a distance between two letters in the same block is shorter than a distance between two letters in different blocks, and the number of characters between two letters is calculated as a distance for the same block, and the like. Those skilled in the art may take various methods known now or in the future to calculate these distances. Then, in step 111, a probability that the distance between the mailbox address and the known author is less than the distance between the mailbox address and the collaborator is determined. For example, if zhao da and li ji collaborate on 10 pieces of works, and the distance between the mailbox and zhao da is smaller than the distance between the mailbox and li ji for 9 pieces of documents, it can be determined that the probability that the distance between the mailbox address and the known author zhao da is smaller than the distance between the mailbox address and the collaborator li ji is 90%. Associating the mailbox with the known author in step 106 when the probability that the distance between the mailbox address and the known author is less than the distance between the mailbox address and the collaborator is greater than a predetermined threshold (e.g., 70%).
Fig. 2 shows a schematic block diagram of a mailbox and author corresponding apparatus according to an embodiment of the present invention. The description of the block diagram can be used to explain the method of the present invention, and the foregoing description of the method can also be used to understand the mailbox and author corresponding apparatus of the present invention.
As shown in fig. 2, according to an embodiment of the present invention, first, the mailbox search unit 201 searches for a mailbox in the searched file including the works of known authors. Next, the author and mailbox co-occurrence amount counting unit 202 counts the number of files in which a known author and a mailbox co-occur, that is, the author and mailbox co-occurrence amount, among files including works of the known author. Then, the partner acquiring unit 203 acquires the partner of the known author in the file where the known author and the mailbox co-occur. Next, the partner and mailbox co-occurrence amount counting unit 204 counts the number of co-occurrences of each partner and the mailbox, that is, the partner and mailbox co-occurrence amount, in the file containing the works of known authors. The number difference judgment unit 205 determines the difference between the author and mailbox co-occurrence amount and the collaborator and mailbox co-occurrence amount. Mailbox association unit 206 associates the known author with a mailbox when the difference is greater than a threshold. If the difference is less than zero, i.e., a number of co-occurrences of a collaborator with the mailbox is greater than a number of co-occurrences of the known author with the mailbox, the mailbox may be deemed not to be associated with the known author and operations for the next mailbox may begin.
On the other hand, according to an embodiment of the present invention, if the difference between the co-occurrence amount of the author and the mailbox determined by the number difference determination unit 205 and the co-occurrence amount of the author and the mailbox is less than the threshold, this indicates that the mailbox should not be directly associated with the known author. At this time, the author and mailbox co-occurrence rate counting unit 207 counts the author and mailbox co-occurrence rate, that is, the percentage of the file in which the known author and the mailbox co-occurrence rate in the file containing the work of the known author are counted, that is, the author and mailbox co-occurrence rate. At the same time or later, the partner and mailbox co-occurrence rate statistic unit 208 counts the partner and mailbox co-occurrence rate, that is, the rate of the file in which each partner and mailbox co-occur in the file containing the work of each partner, that is, the partner and mailbox co-occurrence rate. The file to be retrieved is not limited to a file containing the works of the known authors, but only a file containing the works of the respective collaborators. According to one embodiment, a file including the known author and the collaborative work of each of the collaborators may be deleted from a file containing the work of each of the collaborators. The ratio difference calculation unit 209 then calculates the difference between the author and mailbox co-occurrence rate and the collaborator and mailbox co-occurrence rate. If the co-occurrence rate of the author and the mailbox is greater than the co-occurrence rate of each of the collaborators and the mailbox (higher than a predetermined threshold), the mailbox associating unit 206 associates the author with the mailbox.
On the other hand, if there is the worse case, it indicates that the mailbox may be affected by some factor, such as a reference to a file, etc., it is not desirable to directly associate the mailbox with the known author. At this time, the distance calculating unit 210 calculates the distance between the mailbox address and the known author and the distance between the mailbox address and the collaborator in the file including the collaborative works of the known author and the collaborator. There are various methods for calculating such a distance, such as a file layout method, a layout block of a file is obtained, a distance between two letters in the same block is shorter than a distance between two letters in different blocks, and the number of characters between two letters or a combination of letters is calculated as a distance for the same block, and the like. Those skilled in the art may take various methods known now or in the future to calculate these distances. Then, the distance probability determination unit 211 determines the probability that the distance between the mailbox address and the known author is smaller than the distance between the mailbox address and the collaborator. For example, if zhao da and li ji collaborate on 10 pieces of works, and the distance between the mailbox and zhao da is smaller than the distance between the mailbox and li ji for 9 pieces of documents, it can be determined that the probability that the distance between the mailbox address and the known author zhao da is smaller than the distance between the mailbox address and the collaborator li ji is 90%. When the probability that the distance between the mailbox address and the known author is smaller than the distance between the mailbox address and the collaborator is larger than a predetermined threshold (e.g., 70%), a mailbox associating unit associates the mailbox with the known author.
The present invention can be realized in the form of software, which, when running or after being compiled, can make a processor with processing functions, such as a computer CPU, a field programmable gate array, a chip, a single chip, etc., realize the above functions, method steps, etc., and the software can be stored in a readable storage medium, such as a memory, a hard disk, an optical disk, a magnetic disk, etc.
The above detailed description of the invention is merely to give the person skilled in the art further insight into implementing preferred aspects of the invention, and does not limit the scope of the invention. Only the claims are presented to determine the scope of the invention. Therefore, combinations of features and steps in the foregoing detailed description are not necessary to practice the invention in the broadest sense, and are instead taught merely to particularly detailed representative examples of the invention. Furthermore, the various features of the teachings presented in this specification may be combined in various ways, which, however, are not specifically exemplified, in order to obtain additional useful embodiments of the present invention.

Claims (10)

1. A mailbox author corresponding method comprises the following steps:
a mailbox searching step, namely searching a mailbox in a file containing the works of the known author;
an author and mailbox co-occurrence amount counting step, counting the number of files which co-occur between the known author and the mailbox, namely the author and mailbox co-occurrence amount, in the files containing the works of the known author;
acquiring collaborators of the known author aiming at the works contained in the file which commonly appears between the known author and the mailbox;
a step of counting the co-occurrence quantity of the collaborators and the mailbox, wherein the number of the co-occurrences of the collaborators and the mailbox in the file containing the works of the known collaborators, namely the co-occurrence quantity of the collaborators and the mailbox, is counted;
and a mailbox association step of associating the mailbox with the known author when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox is larger than a preset threshold value.
2. A mailbox author corresponding method as claimed in claim 1, wherein the method further comprises:
a step of counting the co-occurrence rate of the author and the mailbox, wherein the ratio of the files in which the known author and the mailbox co-occur in the files containing the works of the known author, namely the co-occurrence rate of the author and the mailbox, is counted;
a step of counting the co-occurrence rate of the collaborators and the mailboxes, wherein the ratio of the files which co-occur of the collaborators and the mailboxes in the files containing the works of the collaborators, namely the co-occurrence rate of the collaborators and the mailboxes, is counted;
when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of the collaborators and the mailbox is smaller than a preset threshold value, if the difference between the co-occurrence rate of the author and the mailbox and the co-occurrence rate of each collaborator and mailbox is larger than a preset threshold value, the mailbox association step associates the mailbox with the known author.
3. A mailbox author correspondence method as claimed in claim 2, wherein the mailbox association step does not associate the mailbox with the known author when the co-occurrence rate of the author and the mailbox is lower than a predetermined threshold.
4. The mailbox author correspondence method according to claim 2, wherein in the partner and mailbox co-occurrence rate statistics step, a file containing a collaborative work of both a known author and the collaborator is removed from files containing works of the collaborators.
5. The mailbox author corresponding method as claimed in claim 2, wherein the method further comprises:
a distance calculating step of calculating a distance between the mailbox address and the known author and a distance between the mailbox address and the collaborator in a file including the collaborative works of the known author and the collaborators by using a file layout method;
a distance probability calculation step of determining a probability that a distance between the mailbox address and the known author is smaller than a distance between the mailbox address and the collaborator;
the mailbox associating step associates the mailbox with the known author when a probability that a distance between the mailbox address and the known author is less than a distance between the mailbox address and the collaborator is greater than a predetermined threshold.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
7. A mailbox author corresponds device includes:
a mailbox searching unit which searches a mailbox in a file containing the works of the known author;
an author and mailbox co-occurrence amount counting unit which counts the number of files in which a known author and a mailbox co-occur in a file containing a work of the known author, namely the author and mailbox co-occurrence amount;
the partner acquiring unit is used for acquiring each partner of the known author aiming at the works contained in the file which commonly appears between the known author and the mailbox;
the device comprises a partner and mailbox co-occurrence amount counting unit, a mailbox co-occurrence amount counting unit and a judging unit, wherein the partner and mailbox co-occurrence amount is counted in a file containing works of known authors;
the number difference calculating unit is used for calculating the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox;
and the mailbox associating unit associates the mailbox with the known author when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of each collaborator and the mailbox is greater than a preset threshold value.
8. A mailbox author correspondence apparatus as claimed in claim 7, wherein the apparatus further comprises:
the author and mailbox co-occurrence rate counting unit is used for counting the rate of the files in which the known author and the mailbox co-occur in the files containing the works of the known author, namely the author and mailbox co-occurrence rate;
the device comprises a partner and mailbox co-occurrence rate counting unit, a mailbox co-occurrence rate calculating unit and a processing unit, wherein the partner and mailbox co-occurrence rate counting unit counts the ratio of files in which the partners and the mailboxes co-occur in files containing works of the partners, namely the partner and mailbox co-occurrence rate;
the ratio difference calculating unit is used for calculating the difference between the co-occurrence rate of the author and the mailbox and the co-occurrence rate of each collaborator and the mailbox;
when the difference between the co-occurrence amount of the author and the mailbox and the co-occurrence amount of the collaborators and the mailbox is smaller than a preset threshold value, if the difference between the co-occurrence rate of the author and the mailbox and the co-occurrence rate of each collaborator and mailbox is larger than a preset threshold value, the mailbox associating unit associates the mailbox with the known author.
9. The mailbox author correspondence apparatus according to claim 7, wherein when an author and mailbox co-occurrence rate is lower than a predetermined threshold, the mailbox association unit does not associate the mailbox with the known author, and the partner and mailbox co-occurrence rate statistics unit counts up a file containing a work of each partner excluding a file containing a collaborative work of both the known author and the partner.
10. A mailbox author correspondence apparatus as claimed in claim 7, wherein the apparatus further comprises:
a distance calculation unit that calculates a distance between the mailbox address and the known author and a distance between the mailbox address and the collaborator in a file including the collaborative works of the known author and the collaborator;
a distance probability calculation unit that determines a probability that a distance between the mailbox address and the known author is smaller than a distance between the mailbox address and the collaborator;
the mailbox associating unit associates the mailbox with the known author when a probability that a distance between the mailbox address and the known author is smaller than a distance between the mailbox address and the collaborator is larger than a predetermined threshold.
CN201710574481.0A 2017-07-14 2017-07-14 Mailbox author corresponding method and device and computer readable storage medium Active CN107506374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710574481.0A CN107506374B (en) 2017-07-14 2017-07-14 Mailbox author corresponding method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710574481.0A CN107506374B (en) 2017-07-14 2017-07-14 Mailbox author corresponding method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107506374A CN107506374A (en) 2017-12-22
CN107506374B true CN107506374B (en) 2020-02-21

Family

ID=60679892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710574481.0A Active CN107506374B (en) 2017-07-14 2017-07-14 Mailbox author corresponding method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107506374B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412852A (en) * 2013-08-21 2013-11-27 广东电子工业研究院有限公司 Method for automatically extracting key information of English literature
CN104598439A (en) * 2013-10-30 2015-05-06 阿里巴巴集团控股有限公司 Title correction method and device of information object and method for pushing information object
CN106294677A (en) * 2016-08-04 2017-01-04 浙江大学 A kind of towards the name disambiguation method of China author in english literature
CN106776978A (en) * 2016-12-06 2017-05-31 北京赛时科技有限公司 Experts database sets up method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254166A1 (en) * 2011-03-30 2012-10-04 Google Inc. Signature Detection in E-Mails

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412852A (en) * 2013-08-21 2013-11-27 广东电子工业研究院有限公司 Method for automatically extracting key information of English literature
CN104598439A (en) * 2013-10-30 2015-05-06 阿里巴巴集团控股有限公司 Title correction method and device of information object and method for pushing information object
CN106294677A (en) * 2016-08-04 2017-01-04 浙江大学 A kind of towards the name disambiguation method of China author in english literature
CN106776978A (en) * 2016-12-06 2017-05-31 北京赛时科技有限公司 Experts database sets up method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
共词分析过程中的若干问题研究;李纲 等;《中国图书馆学报》;20170430;全文 *

Also Published As

Publication number Publication date
CN107506374A (en) 2017-12-22

Similar Documents

Publication Publication Date Title
CN106250513B (en) Event modeling-based event personalized classification method and system
Jackoway et al. Identification of live news events using Twitter
CN103425777B (en) A kind of based on the short message intelligent classification and the searching method that improve Bayes's classification
CN109597983B (en) Spelling error correction method and device
JP2005085285A5 (en)
JP5543384B2 (en) Local query extraction apparatus, local query extraction program, and local query extraction method
CN107437215B (en) Book recommendation method based on labels
CN103580939A (en) Method and device for detecting abnormal messages based on account number attributes
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN107085568A (en) A kind of text similarity method of discrimination and device
CN105512333A (en) Product comment theme searching method based on emotional tendency
US20160299907A1 (en) Stochastic document clustering using rare features
CN101887415A (en) Automatic extraction method for text document theme word meaning
JP5098631B2 (en) Mail classification system, mail search system
CN110515895B (en) Method and system for carrying out associated storage on data files in big data storage system
CN104899201B (en) Text Extraction, sensitive word determination method, device and server
CN108415971B (en) Method and device for recommending supply and demand information by using knowledge graph
CN107391504A (en) New word identification method and device
CN107506374B (en) Mailbox author corresponding method and device and computer readable storage medium
CN110232160B (en) Method and device for detecting interest point transition event and storage medium
KR101351555B1 (en) classification-extraction system based meaning for text-mining of large data.
JP5798086B2 (en) Device, method and program for extracting pairs of place names and words from a document
CN106933797B (en) Target information generation method and device
CN108090084A (en) A kind of knowledge management method and system
CN107506398B (en) Method for adding label attribute to book

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant