KR101757047B1 - A Method for Searching Co-Occurrence Based on Co-Operational Formation - Google Patents

A Method for Searching Co-Occurrence Based on Co-Operational Formation Download PDF

Info

Publication number
KR101757047B1
KR101757047B1 KR1020150086897A KR20150086897A KR101757047B1 KR 101757047 B1 KR101757047 B1 KR 101757047B1 KR 1020150086897 A KR1020150086897 A KR 1020150086897A KR 20150086897 A KR20150086897 A KR 20150086897A KR 101757047 B1 KR101757047 B1 KR 101757047B1
Authority
KR
South Korea
Prior art keywords
word pair
word
pair
state
storing
Prior art date
Application number
KR1020150086897A
Other languages
Korean (ko)
Other versions
KR20160149619A (en
Inventor
이도헌
박준석
추성지
배성화
장동진
Original Assignee
재단법인 전통천연물기반 유전자동의보감 사업단
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 재단법인 전통천연물기반 유전자동의보감 사업단, 한국과학기술원 filed Critical 재단법인 전통천연물기반 유전자동의보감 사업단
Priority to KR1020150086897A priority Critical patent/KR101757047B1/en
Publication of KR20160149619A publication Critical patent/KR20160149619A/en
Application granted granted Critical
Publication of KR101757047B1 publication Critical patent/KR101757047B1/en

Links

Images

Classifications

    • G06F17/30194
    • G06F17/2705
    • G06F17/30106

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A word-cooperating parallel search method according to the present invention comprises the steps of (a) registering a target text file in analysis file information of a main server, (b) inputting each word pair in a word- (C) storing the word pairs searched in the step (b) in a result database of the user terminal, (b) searching word pairs not searched in the step (b) (D) performing a task according to a result of the check, and (e) storing the word pair searched in the step (d) in a result database of a user terminal and storing the word pair in a pre-worked word pair of the main server (D) is performed by distributing work among a plurality of user terminals.

Figure R1020150086897

Description

[0001] The present invention relates to a word-cooperative parallel search method,

The present invention relates to a cooperative parallel search method using word air information, and more particularly, to a cooperative parallel search method for distributing work between local clusters and remote clusters, To a word-cooperative parallel search method.

Research on biotechnology through analysis of scientific literature is one of the important research fields currently. In order to do this, a method for efficiently processing a large amount of bibliographic information is required.

Most of the bioinformation analysis systems (Litlnspector, PESCADOR, iHOP, EBIMed, PubGene, PolySearch, etc.) that have been developed so far provide defined document analysis information which is a method of storing pre-calculated results in a database . However, this has a problem that it is not possible to effectively process a large amount of document analysis that is not predefined.

In addition, Hadoop based distributed processing framework recently proposed a method to solve the above problems by using a well-configured cluster, but there is a problem that the efficiency may vary depending on the researcher's information system environment.

Therefore, there is a need for an effective and efficient document analysis method which improves the conventional method as described above.

Korean Patent Publication No. 10-2014-0146439

The word cooperative parallel search method according to the present invention can develop an effective and efficient document analysis by using a task scheduler based on a crowdsourcing concept that can perform Hadoop distributed processing in a global environment, .

The solution of the present invention is not limited to those mentioned above, and other solutions not mentioned can be clearly understood by those skilled in the art from the following description.

A word-cooperating parallel search method according to the present invention comprises the steps of (a) registering a target text file in analysis file information of a main server, (b) inputting each word pair in a word- (C) storing the word pairs searched in the step (b) in a result database of the user terminal, (b) searching word pairs not searched in the step (b) (D) performing a task according to a result of the check, and (e) storing the word pair searched in the step (d) in a result database of a user terminal and storing the word pair in a pre-worked word pair of the main server (D) is performed by distributing work among a plurality of user terminals.

The step (d) includes the steps of (d-1), when the operation state of the identified word pair is completed, calling the corresponding word pair from the main server and storing the word pair in the result database of the user terminal, (D-2) storing the word pair in the waiting list of the user terminal when it is recognized that another user terminal is currently working and the word pair is stored in the waiting list if the job status is in a standby state; And (d-3) converting the working state of the word pair into the standby state and storing the state in the main server.

The step (d-3) further comprises the steps of: (d-3-1) converting the working state of the word pair into a standby state and storing the same in the main server, 3-2) storing the corresponding word pair searched by the user terminal in a result database of the user terminal (d-3-3), switching the work state of the corresponding word pair to the completed state and storing it in the main server And (d-3-5) storing the word pair as a pair of words previously worked on the main server.

(D-1) or (d-3), depending on the result of the re-verification. The method according to claim 1, (D-4) may be further performed.

The step (d-4) may be performed by comparing the generation time of the word pair stored in the waiting list with the current time, with the oldest word pair having the generation time as a priority.

In step (d-4), it is determined whether there is a word pair to be searched. If the word pair to be searched in step (d-4-1) is not 0, (D-4-2) of determining the number of word pairs stored in the waiting list, if the number of word pairs stored in the waiting list is two or more in the step (d-4-2) (D-4-3) judging whether the working state of the word pair is a completion state or not; (d-4-3) (d-4-4) of executing the step (d-1) and deleting the word pair from the waiting list in step (d-4-4) (D-4-5) to execute the step (d-3), and if there is one word pair to be searched in the step (d-4-2) (D-4-6) performing the step (d-3), and repeating the steps (d-4-1) to (d-4-6) until the word pair to be searched reaches 0 And repeating (d-4-7).

The word-cooperative parallel search method according to the present invention has the following effects.

First, the present invention disperses the work between the local cluster and the remote clusters using the distributed processing method HADOOP and HIVE technology, and makes it possible to efficiently search word pairs matching the needs of various literature researchers .

Second, the present invention has an advantage that it can help develop the research of the bio information literature by providing an opportunity to find a new word pair which is out of the range of the pre-searched word pair provided in the existing system.

The effects of the present invention are not limited to those mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

1 is a diagram illustrating a data flow between a main server and a user terminal in a word-cooperative parallel search method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a word-air cooperative parallel search method according to an exemplary embodiment of the present invention; FIG. And
FIG. 3 is a flowchart illustrating a detailed procedure of step (d) of the entire process of the word-air cooperative parallel search method according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. In describing the present embodiment, the same designations and the same reference numerals are used for the same components, and further description thereof will be omitted.

1 is a diagram illustrating a data flow between a main server 10 and user terminals 20a, 20b, and 20c in a word-cooperative parallel search method according to an embodiment of the present invention.

1, the word air cooperative parallel search method according to an embodiment of the present invention is performed by distributing work among a plurality of user terminals 20a, 20b, and 20c, and a distributed processing method, HADOOP and HIVE technology , It is possible to distribute the work between the local cluster and the remote clusters and effectively use the word pairs according to the needs of various literature researchers.

FIG. 2 is a flowchart illustrating a word-air cooperative parallel search method according to an exemplary embodiment of the present invention.

As shown in FIG. 2, the word-cooperating parallel search method according to an embodiment of the present invention includes: (a) registering a target text file in analysis file information of a main server; (B) searching each word pair in a word pair previously worked on the main server, (c) storing the word pair searched in (b) in a result database of the user terminal, (b) The main server checks the word pairs that have not been searched at step < RTI ID = 0.0 > step < / RTI > separately from the main server, (D) checking whether the word pair is in the work list in the " unprocessed state " and performing processing on the word pair in the unprocessed state; (E) storing the searched word pair in the result database of the user terminal and storing it in a pre-worked word pair of the main server.

The step (d) may be performed by distributing work among a plurality of user terminals. In other words, the collected jobs are distributed to each user terminal so as to reduce the total number of local jobs for each local job. Hereinafter, each of the above steps will be described in detail.

First, in step (a), the target text file to be analyzed is registered in the analysis file information of the main server. In this embodiment, the md5 key, which is the unique identifier of the target text file, is registered. no.

(B) is performed in which each word pair in the word pair list to be searched is searched for in a word pair previously worked on the main server. Here, the word pair refers to a pair of more than one word, which is separated by various delimiters such as a tab delimiter, a semicolon, etc., and it is determined whether or not a word pair to be searched exists in a word pair list previously worked on the main server Quot; search "

If the word pair is searched by the present process, the word pair retrieved in the step (b) is fetched and immediately stored in the result database of the user terminal. Accordingly, a plurality of jobs to be processed can be reduced as follows.

Figure 112015059242585-pat00001

Where T p is the total number of jobs to be processed, T t is the total number of jobs, and T c is the number of jobs processed by this step.

However, if there is no corresponding word pair in the word pair list previously worked on the main server, step (d) is performed in which the main server checks word pairs that have not been searched in step (b).

Here, the step (d) may include steps (d-1) to (d-4). FIG. 3 is a flowchart illustrating a detailed procedure of step (d) of the entire process of the word-air cooperative parallel search method according to an embodiment of the present invention.

As described above, in step (d), the main server individually checks word pairs that have not been searched in step (b), and the result of the checking is that the task status is 'complete' ', Or the case where there is no work state so that it does not correspond to either side.

Accordingly, in the step (d-1), when the operation state of the verified word pair is 'completed', the word pair is retrieved from the main server and stored in the result database of the user terminal.

delete

delete

delete

In the step (d-2), if the operation state of the identified word pair is 'standby', it is recognized that another user terminal is working and the corresponding word pair is stored in the standby list of the user terminal. Accordingly, the work status of the word pair stored in the waiting list can be reconfirmed.

In step (d-3), if there is no working state of the identified word pair, the work state of the word pair is switched to the 'standby state' and stored in the main server. Accordingly, when another user terminal confirms the work status of the corresponding word pair, it is stored in the waiting list.

More specifically, the step (d-3) may include the steps (d-3-1) to (d-3-5). In step (d-3-1), the work state of the corresponding word pair is changed to the standby state and stored in the main server. In step (d-3-2), the user terminal searches for a corresponding word pair do.

Accordingly, in step (d-3-3), the corresponding word pair retrieved from the user terminal is stored in the result database of the user terminal, and then, in step (d-3-4) And stores it in the main server.

In step (d-3-5), the corresponding word pair is stored in the main server as a pair of previously processed words, and then another user terminal can process the word pair as the (d-1) step.

Next, in step (d-4), the user terminal reconfirms whether the work status of the word pair stored in the wait list is 'completed' in the main server, and in step (d-1) (d-3). That is, if the result of the re-verification is 'complete state', the step (d-1) is performed and if the state is still 'waiting', the step (d-3) will be performed.

The step (d-4) may be repeated until there is no word pair stored in the waiting list.

In the step (d-4), when another user terminal cancels the operation or remains in the task list of the main server for the unspecified reason, the generation time of the word pair stored in the standby list and the current time The comparison may be performed with the word pair having the oldest generation time as a priority. Accordingly, the rest of the user terminals can change the job to the 'completed state'.

The specific algorithm of step (d-4) is as follows.

First, a step (d-4-1) of determining whether a word pair to be searched exists is performed. In this step, it is judged whether the word pair to be searched is 0 or not.

If it is determined in step (d-4-1) that the number of word pairs to be searched is not 0, step (d-4-2) of determining the number of word pairs stored in the waiting list is performed. In this step, it is judged whether there is one word pair stored in the waiting list or more than one.

If the number of word pairs stored in the waiting list is two or more according to the determination, step (d-4-3) of determining whether the working status of the word pair having the oldest generation time is 'completed' is performed do.

If it is determined that the completion state is satisfied, the step (d-1) is executed and the corresponding word pair is deleted from the waiting list (d-4-4) The step (d-4-5) of executing the step (d-3) described above is performed.

Alternatively, if the word pair to be searched in step (d-4-2) is one word, step (d-4-6) is performed to execute the step (d-3).

Thereafter, the steps (d-4-1) to (d-4-6) are repeated until the number of word pairs to be searched reaches 0 in step (d-4-1) Steps 4-7) are performed.

As described above, the present invention can distribute the work of the local cluster to the work among the remote clusters, thereby providing a more efficient search environment.

After step (d), step (e) is performed in which the word pairs retrieved in step (d) are stored in a result database of the user terminal and stored as pairs of words previously worked on the main server. That is, in step (d), the word pair having completed the operation is stored in the main server and registered as a pair of words previously worked on the main server.

Accordingly, the corresponding word pair can be processed by steps (b) and (c) described above when the search is performed later.

As described above, according to the present invention, the total number of tasks can be reduced as follows through cooperative parallel search among a plurality of user terminals.

Figure 112015059242585-pat00002

Where T t is the number of all jobs, T c is the number of jobs processed by step (b), T s is the number of identical jobs, T l is the number of user terminals, This is the number of jobs after status release.

The embodiments and the accompanying drawings described in the present specification are merely illustrative of some of the technical ideas included in the present invention. Accordingly, the embodiments disclosed herein are for the purpose of describing rather than limiting the technical spirit of the present invention, and it is apparent that the scope of the technical idea of the present invention is not limited by these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

10: main server 20a, 20b, 20c:

Claims (6)

(A) registering the target text file in the analysis file information of the main server;
(B) searching each word pair of a word pair list to be searched in each of a plurality of user terminals in a word air working word pair in advance in the main server;
(C) storing the searched word pair in the result database of the user terminal;
The main server individually searches for a word pair that has not been searched in the step (b), checks whether the word pair is in a task completion state, a work waiting state, or an unprocessed state,
(D-1) storing the word pair in the result database of the user terminal when the operation state of the verified word pair is completed and stored in the list of word pairs that have been previously operated, from the main server, (D-2) of recognizing that another user terminal is working and storing the word pair in the waiting list of the user terminal when the working status of the word pair is stored in the waiting list, (D-3) when the word list is stored in the work list, the word pair is moved to the wait list, and the task status is changed to the wait status and stored (d-3). And
Storing the word pairs retrieved in the step (d) in a result database of the user terminal, and storing the retrieved word pairs as a pair of previously worked words of the main server;
/ RTI >
Wherein the step (d) is performed by distributing work among a plurality of user terminals.
delete The method according to claim 1,
The step (d-3)
(D-3-1) of converting the work state of the word pair into a standby state (stored in the standby list) and storing it in the main server;
(D-3-2) searching the corresponding word pair in the user terminal;
(D-3-3) storing a corresponding word pair retrieved from the user terminal in a result database of the user terminal; And
(D-3-4) storing the work state of the word pair in the main server by switching the work state of the word pair into the completion state (storing it in the previously worked word pair list);
A method for cooperative parallel search of words.
The method according to claim 1,
After the step (d-2)
(D-1) step or (d-3) step according to the result of the re-checking, if the work state of the word pair stored in the waiting list is completed Performed word cooperative parallel search method.
5. The method of claim 4,
The step (d-4)
Comparing the generation time of the word pair stored in the waiting list with the current time and taking the oldest word pair having the generation time as a priority.
6. The method of claim 5,
The step (d-4)
(D-4-1) judging whether a word pair to be searched exists or not;
(D-4-2) determining the number of word pairs stored in the waiting list when the word pair to be searched in step (d-4-1) is not 0;
If the number of word pairs stored in the waiting list is two or more in the step (d-4-2), it is determined whether the working state of the word pair having the oldest generation time is completed or not (d-4-3) ;
(D-1) is executed and the corresponding word pair is deleted from the waiting list when the working state of the word pair having the oldest generation time is the completed state in the step (d-4-3) -4) step;
(D-4-5) when the operation state of the word pair having the earliest creation time in the step (d-4-3) is not the completion state, executing the step (d-3);
(d-4-6) when the word pair to be searched in step (d-4-2) is one word, performing the step (d-3); And
Repeating the steps (d-4-1) to (d-4-6) until the word pair to be searched becomes 0 (d-4-7);
A method for cooperative parallel search of words.
KR1020150086897A 2015-06-18 2015-06-18 A Method for Searching Co-Occurrence Based on Co-Operational Formation KR101757047B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150086897A KR101757047B1 (en) 2015-06-18 2015-06-18 A Method for Searching Co-Occurrence Based on Co-Operational Formation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150086897A KR101757047B1 (en) 2015-06-18 2015-06-18 A Method for Searching Co-Occurrence Based on Co-Operational Formation

Publications (2)

Publication Number Publication Date
KR20160149619A KR20160149619A (en) 2016-12-28
KR101757047B1 true KR101757047B1 (en) 2017-07-12

Family

ID=57724660

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150086897A KR101757047B1 (en) 2015-06-18 2015-06-18 A Method for Searching Co-Occurrence Based on Co-Operational Formation

Country Status (1)

Country Link
KR (1) KR101757047B1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015032228A (en) 2013-08-05 2015-02-16 Kddi株式会社 Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101595342B1 (en) 2013-06-17 2016-02-18 고려대학교 산학협력단 Apparatus and method for forecasting emerging technology based on patent keyword analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015032228A (en) 2013-08-05 2015-02-16 Kddi株式会社 Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym

Also Published As

Publication number Publication date
KR20160149619A (en) 2016-12-28

Similar Documents

Publication Publication Date Title
CN109739894B (en) Method, device, equipment and storage medium for supplementing metadata description
US11423082B2 (en) Methods and apparatus for subgraph matching in big data analysis
CN107784026B (en) ETL data processing method and device
GB2456619A (en) Managing errors generated in an apparatus
CN103198103B (en) The microblogging method for pushing of a kind of density based term clustering and device
US10033737B2 (en) System and method for cross-cloud identity matching
US20160125095A1 (en) Lightweight temporal graph management engine
CN107871055B (en) Data analysis method and device
KR102205686B1 (en) Method and apparatus for ranking candiate character and method and device for inputting character
US8918406B2 (en) Intelligent analysis queue construction
CN110232108A (en) Interactive method and conversational system
US20120197938A1 (en) Search request control apparatus and search request control method
KR101757047B1 (en) A Method for Searching Co-Occurrence Based on Co-Operational Formation
CN109902196B (en) Trademark category recommendation method and device, computer equipment and storage medium
WO2017072794A1 (en) An automated remote computing method and system by email platform for molecular analysis
Wakayama et al. Distributed forests for MapReduce-based machine learning
CN113220822A (en) Document data storage method and device
CN110019547B (en) Method, device, equipment and medium for acquiring association relation between clients
CN116304258B (en) Retrieval method, retrieval system and readable storage medium based on vector database
CN110955710A (en) Method and device for processing dirty data in data exchange operation
CN118096091B (en) Mechanical engineering project demand analysis method, system, equipment and medium
KR102670211B1 (en) Functional protein classification for epidemiological research
CN117216800B (en) Privacy removing processing method and device for large-batch medical record data
KR102676806B1 (en) Ttp based automated playbook generation method and system performing the same
CN116431698B (en) Data extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E90F Notification of reason for final refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant