CN101751423A - Article duplicate checking method and system - Google Patents

Article duplicate checking method and system Download PDF

Info

Publication number
CN101751423A
CN101751423A CN200810239292A CN200810239292A CN101751423A CN 101751423 A CN101751423 A CN 101751423A CN 200810239292 A CN200810239292 A CN 200810239292A CN 200810239292 A CN200810239292 A CN 200810239292A CN 101751423 A CN101751423 A CN 101751423A
Authority
CN
China
Prior art keywords
contribution
information
heavy
signing
sign
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810239292A
Other languages
Chinese (zh)
Other versions
CN101751423B (en
Inventor
沈晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING BEIDA FOUNDER ELECTRONICS Co Ltd
New Founder Holdings Development Co ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN2008102392929A priority Critical patent/CN101751423B/en
Publication of CN101751423A publication Critical patent/CN101751423A/en
Application granted granted Critical
Publication of CN101751423B publication Critical patent/CN101751423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses an article duplicate checking method and an article duplicate checking system, which aim to solve the problem of article duplicates when articles are issued in the prior art. The method of the invention comprises the steps that: after article information in a production database is correspondingly modified because the articles on a layout are operated, an event trigger acquires the modified article information, wherein the article information comprises article content; and a duplicate checking server performs article duplicate content comparison on the article information which is not subjected to the article duplicate content comparison in the acquired article information, and determines article duplicate information. As the duplicate checking server performs the article duplicate content comparison on the article information which is not subjected to the article duplicate content comparison in the acquired article information, the article duplicate information is finally determined.

Description

A kind of method and system of article duplicate checking
Technical field
The invention belongs to field of information processing, particularly a kind of method and system of article duplicate checking.
Background technology
At present when journalism is produced, with original literal contribution, the picture contribution, contributions such as audio frequency contribution and video contribution are shipped to the contribution server, the contribution server stores the contribution of delivering in the production data storehouse into, the literal contribution of delivering with a quilt is an example, when the contribution server is delivered, need title with the literal contribution, contribution information such as body matter and author's title send to server, server is with the title of literal contribution, original contribution information such as body matter and author's title deposit the production data storehouse in, the contribution that needs before delivering these to be deposited in database is selected for use, sign and issue, sequence of operations such as issue, contribution after selecting for use is called as the contribution on the version, in the prior art because there is not article duplicate checking to handle, therefore the contribution that a same piece of writing delivered may occur and repeatedly be selected for use, occur the problem of heavy original text when causing contribution information after examining to deliver on the website or on the newspaper.
Summary of the invention
The purpose of the embodiment of the invention has been to provide a kind of method and system of article duplicate checking, in order to solve the problem that occurs heavy original text when contribution is delivered in the prior art.
To achieve these goals, the embodiment of the invention provides a kind of method of article duplicate checking, comprising:
Contribution information in the production data storehouse, because of the contribution on the space of a whole page is operated by after the corresponding modification, event trigger obtains amended contribution information, and described contribution information comprises the contribution content;
Look into heavy server the contribution information of not carrying out the comparison of repetition contribution content in the contribution information of obtaining is carried out repetition contribution content relatively, determine heavy original text information.
The embodiment of the invention also provides a kind of system of article duplicate checking simultaneously, comprising:
Event trigger: be used for the contribution information in production data storehouse, because of the contribution on the space of a whole page is operated by after the corresponding modification, obtain amended contribution information, described contribution information comprises the contribution content;
Look into heavy server: the contribution information that is used for the contribution information of obtaining is not carried out the comparison of repetition contribution content is carried out repetition contribution content relatively, determines heavy original text information.
The specific embodiments that is provided by the invention described above as can be seen, carry out repetition contribution content relatively just because of looking into the contribution information of not carrying out the comparison of repetition contribution content in the contribution information that heavy server obtains event trigger, make and the final heavy original text information of determining reduce the number of times that heavy original text occurs when contribution is delivered.
Description of drawings
Fig. 1 is the first embodiment method flow diagram provided by the invention;
Fig. 2 is the second embodiment system construction drawing provided by the invention.
Embodiment
In order to solve the problem that occurs heavy original text when contribution is delivered in the prior art, the embodiment of the invention provides a kind of method and system of article duplicate checking.Be example with newspaper office's production below, describe, but being not limited to newspaper office produces, be suitable for as the Internet news issue is same, when the contribution on the space of a whole page is operated, production data storehouse contribution information is modified, and event trigger obtains contribution information from the production data storehouse, and described contribution information comprises the contribution content; Look into heavy server the contribution information of not carrying out the comparison of repetition contribution content in the contribution information of obtaining is looked into heavily, determine heavy original text information.Scheme by the embodiment of the invention can realize, reduces the number of times that heavy original text occurs when contribution is delivered.
First embodiment provided by the invention is a kind of method of article duplicate checking, and elder generation is that example describes with the production run of the production system of newspaper office, and method flow comprises as shown in Figure 1:
Step 101: when the contribution on the space of a whole page was operated, production data storehouse contribution information was modified, and event trigger obtains contribution information from the production data storehouse.
Step 102: the contribution information classification of each contribution that event trigger will obtain is handled, and stores into and look in the heavy database of looking into of heavy server.
Step 103: look into heavy server and do not look into heavy contribution information in the heavy database and look into heavily, determine heavy original text information to looking into.
Step 104: look into heavy collation server and look into the data that weigh in the database, the in real time heavy original text information reminding of the data in the newspaper office production data storehouse, and transmission synchronously.
Step 105: send in real time heavy original text information reminding and sign and issue the user, reuse the family workbench and check heavy original text information by looking into to contribution.
The contribution information that enforcement is signed and issued comprises and newly signing and issuing, transfer version or revision to sign information after signing and issuing, and signs and issues (cancel again after promptly signing and issuing and to sign and issue) signed with recession.
In newspaper office's production system, when the contribution on the space of a whole page is operated, the contribution information of storing in newspaper office's production data storehouse contribution information table can be by corresponding modify, by the event trigger of on the contribution information table, creating, sign and issue, transfer version, revision when contribution and sign (original text label: the contribution information except the contribution content, for example contribution title, author etc.) or when removing label, event trigger gets access to the contribution information after the corresponding modify in real time, and with the table 1 of these contribution information synchronization (duplicate copy) to newspaper office's storage facility located at processing plant: sign and issue in the contribution cache table.Contribution information comprises the field information except that modify_status and duple_id in the table 1.
Major key Title Data type Length Note
??P ??id ??Bigint
??paper_code ??Varchar ??32 Newspaper under the contribution
??column_code ??Varchar ??32 Column under the contribution
??filecode ??Varchar ??32 The contribution coding
??author ??Varchar ??32 The author
??title ??Varchar ??255 Title
Major key Title Data type Length Note
??sub_title ??Varchar ??255 Subtitle
??pull_title ??Varchar ??255 Eyebrow head
??content ??Text Content
??words ??Int Number of words
??sign_time ??Datetime Sign and issue the time
??sign_user_code ??Varchar ??32 Sign and issue subscriber-coded
??sign_user_name ??Varchar ??32 Sign and issue address name
??column_date ??Datetime The periodical phase
??layout_name ??Varchar ??32 Space of a whole page title
??layout_code ??Varchar ??32 Space of a whole page coding
??flow_type ??Int Contribution flow process type 0 is that non-flow process contribution 1 is the flow process contribution
??modify_status ??Int Contribution state 0 is signed and issued for contribution and has been finished and looks into heavily, and not revising 1 expression contribution for information about signs and issues but does not look into heavily, and do not revise 2 expression contributions after signing and issuing for information about and removed label>=3 and represent that contribution signs and issues, but for information about modification is arranged
??duple_id ??Bigint Id behind the article duplicate checking
Table 1
When contribution is signed and issued, event trigger inserts a contribution status information modify_status in table 1, the value of the field information in the table 1 except that modify_status and duple_id, all consistent with contribution information in the newspaper office production data storehouse, the value of modify_status field is revised as 1, and the value of duple_id is a null value.
When contribution was removed label, event trigger was revised as 2 with the value of modify_status field.
When contribution transfers version or revision to sign, contribution information after event trigger is signed with step version or revision, if the value of modify_status field is less than 3, the value of modify_status field is revised as 3, if the value of modify_status field is more than or equal to 3, the value of modify_status field is added 1.
Also can comprise before the step 101, delete printing phase all contributions before the same day in the table one, because the contribution of historical periodical phase belongs to the data of appearing in the newspapers, can enter as historical contribution distribution by warehouse-in agency and look into heavy database, no longer belong to the periodical phase on the same day need look into heavy contribution in real time.
All contributions all are to have looked into heavy in the table 1 if be checked through, the value that is modify_status all is 0, representing then that all contributions have all been looked into heavily finishes, the contribution of newly not signing and issuing need be looked into heavily and (carry out repetition contribution content relatively, promptly the contribution content to the content field in the contribution information compares), finish.
If being checked through the value of the modify_status of all contribution information correspondences in the table 1 not all is 0, then begin to obtain the contribution information of different conditions in the table 1.
Obtain the contribution information of removing label earlier, promptly modify_status value in respective flag position is 2 contribution information, be kept to remove to sign in the contribution tabulation (cancleDocumentList is promptly in the internal memory), and can certainly be in hard disk or database, to preserve temporarily.
Obtain the contribution of newly signing and issuing again, promptly modify_status value in respective flag position is 1 contribution information, is kept at newly to sign and issue in the contribution tabulation (newDocumentList is promptly in the internal memory).
Obtain the contribution of transferring version or revision to sign information after signing and issuing at last, promptly the contribution of respective flag position modify_status value>=3 is kept at and revises in the contribution tabulation (modifiedDocumentList is promptly in the internal memory).
Can comprise also before the step 102 that table 2 is looked into heavy contribution message buffer table in the heavy database, table 3 is looked into the content that weighs the object information buffer table and all emptied looking into.
Major key Title Data type Length Note
??P id ??Bigint
column_code ??Varchar ??32 Column under the contribution
filecode ??Varchar ??32 The contribution coding
author ??Varchar ??32 The author
title ??Varchar ??255 Title
sub_title ??Varchar ??255 Subtitle
Major key Title Data type Length Note
pull_title ??Varchar ??255 Eyebrow head
content ??Text Content
words ??Int Number of words
sign_time ??Datetime Sign and issue the time
sign_user_code ??Varchar ??32 Sign and issue subscriber-coded
sign_user_name ??Varchar ??32 Sign and issue address name
column_date ??Datetime The periodical phase
fill_time ??Datetime Entry time
layout_name ??Varchar ??32 Space of a whole page title
layout_code ??Varchar ??32 Space of a whole page coding
??calculate_status ??Int Whether contribution looks into heavy 0 for not looking into heavy 1 for looking into heavily
??publish_status ??Int Contribution appears in the newspapers state 0 for not appearing in the newspapers 1 for appearing in the newspapers
??flow_type ??Int Contribution flow process type 0 is that non-flow process contribution 1 is the flow process contribution
Table 2
Major key Title Data type Length Note
??P ??this_id ??Bigint The Id of contribution 1
Major key Title Data type Length Note
??this_column_code ??Varchar ??32 The name of tv column of contribution 1
??this_filecode ??Varchar ??32 The contribution coding of contribution 1
??this_author ??Varchar ??32 The author of contribution 1
??this_title ??Varchar ??255 The title of contribution 1
??this_sub_title ??Varchar ??255 The subtitle of contribution 1
??this_pull_title ??Varchar ??255 The eyebrow head of contribution 1
??this_words ??Int The number of words of contribution 1
??this_sign_time ??Datetime Signing and issuing the time of contribution 1
??this_sign_user_code ??Varchar ??32 Signing and issuing of contribution 1 is subscriber-coded
??this_sign_user_name ??Varchar ??32 Contribution 1 sign and issue address name
??this_fill_time ??Datetime The entry time of contribution 1
??this_layout_name ??Varchar ??32 The space of a whole page title of contribution 1
??this_layout_code ??Varchar ??32 The space of a whole page coding of contribution 1
??this_publish_status ??Int The state that appears in the newspapers 0 of contribution 1 is not for appearing in the newspapers 1 for appearing in the newspapers
??this_flow_type ??Int The flow process type 0 of contribution 1 is that non-flow process contribution 1 is the flow process contribution
??P ??that_id ??Bigint The Id of contribution 2
??that_column_code ??Varchar ??32 The name of tv column of contribution 2
??that_filecode ??Varchar ??32 The contribution coding of contribution 2
Major key Title Data type Length Note
??that_author ??Varchar ??32 The author of contribution 2
??that_title ??Varchar ??255 The title of contribution 2
??that_sub_title ??Varchar ??255 The subtitle of contribution 2
??that_pull_title ??Varchar ??255 The eyebrow head of contribution 2
??that_words ??Int The number of words of contribution 2
??that_sign_time ??Datetime Signing and issuing the time of contribution 2
??that_sign_user_code ??Varchar ??32 Signing and issuing of contribution 2 is subscriber-coded
??that_sign_user_name ??Varchar ??32 Contribution 2 sign and issue address name
??that_fill_time ??Datetime The entry time of contribution 2
??that_layout_name ??Varchar ??32 The space of a whole page title of contribution 2
??that_layout_code ??Varchar ??32 The space of a whole page coding of contribution 2
??that_publish_status ??Int The state that appears in the newspapers 0 of contribution 2 is not for appearing in the newspapers 1 for appearing in the newspapers
??that_flow_type ??Int The flow process type 0 of contribution 2 is that non-flow process contribution 1 is the flow process contribution
??duple_rate ??Int Contribution 1 and contribution 2 heavy original text similarities
Table 3
In step 102, at first,,, be used as and newly sign and issue the contribution processing if do not look into heavily for transferring version or revision to sign the contribution information of information after signing and issuing; If looked into overweightly, only need look into the contribution information in the heavy database synchronously, and notice contribution production data storehouse is synchronous.
Circulation contribution tabulation modifiedDocumentList, if contribution information is to there being duple_id, the proof contribution is to have looked into heavy, then the contribution information among the modifiedDocumentList with look into table 4 in the heavy database and look in the heavy contribution information table id and equal the heavy contribution information of looking into of duple_id and carry out synchronously, looking into the heavy information (duple_rate, contribution 1 and contribution 2 heavy original text similarities are 80%) of looking into that this_id in the heavy object information table or that_id equal duple_id with table 5 carries out synchronously.
Major key Title Data type Length Note
??P ??id ??Bigint
??column_code ??Varchar ??32 Column under the contribution
??filecode ??Varchar ??32 The contribution coding
??author ??Varchar ??32 The author
??title ??Varchar ??255 Title
??sub_title ??Varchar ??255 Subtitle
??pull_title ??Varchar ??255 Eyebrow head
??content ??Text Content
??words ??Int Number of words
??sign_time ??Datetime Sign and issue the time
??sign_user_code ??Varchar ??32 Sign and issue subscriber-coded
??sign_user_name ??Varchar ??32 Sign and issue address name
??column_date ??Datetime The periodical phase
??fill_time ??Datetime Entry time
??layout_name ??Varchar ??32 Space of a whole page title
??layout_code ??Varchar ??32 Space of a whole page coding
Major key Title Data type Length Note
??calculate_status ??Int Whether contribution looks into heavy 0 for not looking into heavy 1 for looking into heavily
??publish__status ??Int Contribution appears in the newspapers state 0 for not appearing in the newspapers 1 for appearing in the newspapers
??flow_type ??Int Contribution flow process type 0 is that non-flow process contribution 1 is the flow process contribution
Table 4
Major key Title Data type Length Note
??P ??this_id ??Bigint The Id of contribution 1
??this_column_code ??Varchar ??32 The name of tv column of contribution 1
??this_filecode ??Varchar ??32 The contribution coding of contribution 1
??this_author ??Varchar ??32 The author of contribution 1
??this_title ??Varchar ??255 The title of contribution 1
??this_sub_title ??Varchar ??255 The subtitle of contribution 1
??this_pull_title ??Varchar ??255 The eyebrow head of contribution 1
??this_words ??Int The number of words of contribution 1
??this_sign_time ??Datetime Signing and issuing the time of contribution 1
??this_sign_user_code ??Varchar ??32 Signing and issuing of contribution 1 is subscriber-coded
??this_sign_user_name ??Varchar ??32 Contribution 1 sign and issue address name
Major key Title Data type Length Note
??this_fill_time ??Datetime The entry time of contribution 1
??this_layout_name ??Varchar ??32 The space of a whole page title of contribution 1
??this_layout_code ??Varchar ??32 The space of a whole page coding of contribution 1
??this_publish_status ??Int The state that appears in the newspapers 0 of contribution 1 is not for appearing in the newspapers 1 for appearing in the newspapers
??this_flow_type ??Int The flow process type 0 of contribution 1 is that non-flow process contribution 1 is the flow process contribution
??P ??that_id ??Bigint The Id of contribution 2
??that_column_code ??Varchar ??32 The name of tv column of contribution 2
??that_filecode ??Varchar ??32 The contribution coding of contribution 2
??that_author ??Varchar ??32 The author of contribution 2
??that_title ??Varchar ??255 The title of contribution 2
??that_sub_title ??Varchar ??255 The subtitle of contribution 2
??that_pull_title ??Varchar ??255 The eyebrow head of contribution 2
??that_words ??Int The number of words of contribution 2
??that_sign_time ??Datetime Signing and issuing the time of contribution 2
??that_sign_user_code ??Varchar ??32 Signing and issuing of contribution 2 is subscriber-coded
??that_sign_user_name ??Varchar ??32 Contribution 2 sign and issue address name
??that_fill_time ??Datetime The entry time of contribution 2
??that_layout_name ??Varchar ??32 The space of a whole page title of contribution 2
Major key Title Data type Length Note
??that_layout_code ??Varchar ??32 The space of a whole page coding of contribution 2
??that_publish_status ??Int The state that appears in the newspapers 0 of contribution 2 is not for appearing in the newspapers 1 for appearing in the newspapers
??that_flow_type ??Int The flow process type 0 of contribution 2 is that non-flow process contribution 1 is the flow process contribution
??duple_rate ??Int Contribution 1,2 heavy original text similarities
Table 5
If the contribution information among the modifiedDocumentList does not have duple_id, prove that contribution is also not look into heavily, then contribution information is inserted in the table 2, generate duple_id for contribution simultaneously.
Secondly, for the contribution information of newly signing and issuing, contribution information is saved in looks in the heavy database.
Circulation contribution tabulation newDocumentList is inserted into wherein contribution information in the table 2, is contribution generation duple_id simultaneously.
At last, for signing and issuing the contribution of signing with recession, delete its corresponding data in looking into heavy database.
Circulation contribution tabulation cancleDocumentList, if contribution has duple_id, the proof contribution is to have looked into heavy to sign with recession, and then id equals the contribution information of duple_id in the delete list four, and this_id or that_id equal the heavy object information of looking into of duple_id in the delete list five.
If contribution does not have duple_id, prove that contribution does not have not signed through looking into heavy just removing, do not need to do other processing herein.
Step 103 can be looked into heavily by the third-party weight software of looking into, as utilize the third-party plug-in unit magnanimity heavy basic part 2.0 editions that disappears to finish, at first the content with historical contribution inputs to plug-in unit, plug-in unit can make up the heavy storehouse that disappears automatically by Chinese words segmentation efficiently in internal memory, and then input to plug-in unit looking into heavy contribution content and minimum similarity numerical value, plug-in unit utilize Chinese word segmentation comparison techniques accurately will look into heavy contribution and the heavy storehouse that disappears in all contribution content informations compare, all comparative results that are higher than minimum similarity are returned.Can comprise before, from look into heavy database, obtain various look into to reset put parameter, comprise and look into heavy minimum similarity that look into information such as heavy key word, the value of these parameters will directly have influence on the heavy information of looking into of final affirmation.Obtain all and need look into heavy contribution from table 2, piece by piece contribution is looked into heavily, the heavy result that looks into of every piece of contribution is kept at unified one and looks in heavy the results list (duplationList).Looking into heavily of contribution wherein not only comprises with history the looking into heavily between the contribution (contribution in the table 4) of appearing in the newspapers, and comprises with other and sign and issue looking into heavily between the contribution (contribution in the table 2) in real time.
Look into after the heavily end, heavy the results list duplationList is looked in circulation, and the heavy information of finally confirming of looking into is inserted in the table 3.
For step 104 at first, in all the data importing tables 4 in the table 2, in all the data importing tables 5 in the table 3, when looking into heavily as a result, family workbench retrieval uses for looking into to reuse.The disposable submission of whole import operation is finished, and guarantees the result's that the user job platform retrieves integrality and accuracy.
Secondly, in the deletion newspaper office production data storehouse table 1 with all corresponding contributions of contribution tabulation cancleDocumentList.
At last, synchronously in newspaper office's database table 1 with contribution tabulation newDocumentList and the corresponding contribution information of modifiedDocumentList.And the value of modification field modify_status is for looking into heavily, promptly equal 0, because in looking into heavy process, contribution in the table 1 might be carried out associative operation by the production system of newspaper office simultaneously, the value of the field modify_status device that may be triggered is revised, so revise, only revise the contribution that modify_status did not change all the time herein; The sign of backfill contribution in looking into heavy database, promptly give duple_id field assignment, before (if do not have duple_id as the contribution information among the modifiedDocumentList in the step 102, the proof contribution is also not look into heavily, then contribution information being inserted in the table 2, being that simultaneously contribution generates duple_id) the duple_id value of the contribution of newly-generated duple_id is synchronized to the duple_id field of corresponding contribution in the table 1.
For step 105, from table 2, find and satisfy the heavy original text information that sends the real-time reminding condition, reuse family workbench transmission to looking into one by one, and the interface that provides by the real-time communication instrument sends the sign and issue user of heavy original text information reminding to contribution, the user reuses the family workbench by looking into, select certain conditions, check the heavy original text record tabulation that all comprise heavy original text information.
The user selects certain the bar record in the tabulation of heavy original text record, and click is checked, can check all heavy original text information of concrete certain contribution.
Second embodiment provided by the invention is a kind of system of article duplicate checking, and its structure comprises as shown in Figure 2:
Event trigger 202: be used for the contribution information in production data storehouse, because of the contribution on the space of a whole page is operated by after the corresponding modification, obtain amended contribution information, described contribution information comprises the contribution content;
Look into heavy server 204: be used for the contribution information that the contribution information of obtaining is not carried out the comparison of repetition contribution content is carried out repetition contribution content relatively, determine heavy original text information.
Further, event trigger 202: the status information that also is used to obtain amended contribution;
Look into heavy server 204: the contribution information stores that also is used for obtaining is to looking into heavy database, determine not carry out repetition contribution content contribution information relatively according to the status information of contribution, carry out repetition contribution content relatively to looking into the contribution information of not carrying out the comparison of repetition contribution content in the heavy database, the heavy original text information of determining is kept at looks into heavy database.
Further, event trigger 202: the contribution information that also is used for the production data storehouse, because of the contribution on the space of a whole page being signed and issued operation, transferring after signing and issuing version, revision to sign information and sign and issue with the label of dropping back, and by after the corresponding modification, obtain amended contribution information, described contribution information comprises the contribution content;
Look into heavy server 204: also be used for carrying out repetition contribution content relatively for the contribution of newly signing and issuing, the heavy original text information of determining is kept at looks into the weight database, the contribution information of the contribution of newly signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively;
For signing and issuing the contribution of signing with recession, if carried out repetition contribution content relatively, then contribution information corresponding in the heavy database and heavy original text information are looked in deletion, and delete in notice contribution production data storehouse; If do not look into heavily, directly notify contribution production data storehouse to delete;
Sign the contribution of information for transferring version or revision after signing and issuing, if do not carry out repetition contribution content relatively, the contribution of transferring version or revision to sign information after signing and issuing carries out repetition contribution content relatively, the heavy original text information of determining is kept at looks into the weight database, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively, if carried out repetition contribution content relatively, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice contribution production data storehouse.
Further, look into heavy server 204: also be used for to looking into the heavy original text information of family workbench transmission of reusing;
Described system also comprises:
Look into and reuse family workbench 206: be used to show heavy original text information.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. the method for an article duplicate checking is characterized in that, comprising:
Contribution information in the production data storehouse, because of the contribution on the space of a whole page is operated by after the corresponding modification, event trigger obtains amended contribution information, and described contribution information comprises the contribution content;
Look into heavy server the contribution information of not carrying out the comparison of repetition contribution content in the contribution information of obtaining is carried out repetition contribution content relatively, determine heavy original text information.
2. the method for claim 1 is characterized in that, looks into heavy server and carries out also comprising before the repetition contribution content comparison step:
Event trigger obtains the status information of amended contribution;
Look into heavy server and carry out that repetition contribution content is more specific to be:
Looking into heavy server weighs the contribution information stores of obtaining in the database to looking into, determine not carry out repetition contribution content contribution information relatively according to the status information of contribution, carry out repetition contribution content relatively to looking into the contribution information of not carrying out the comparison of repetition contribution content in the heavy database, the heavy original text information of determining is kept at looks into heavy database.
3. method as claimed in claim 2 is characterized in that, the contribution on the space of a whole page is operated comprise:
Contribution on the space of a whole page is signed and issued operation, transferred after signing and issuing version, revision to sign information and sign and issue with recession and sign.
4. method as claimed in claim 3, it is characterized in that, look into heavy server and carry out repetition contribution content relatively for the contribution of newly signing and issuing, the heavy original text information of determining is kept at looks into the weight database, the contribution information of the contribution of newly signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively;
Look into heavy server for signing and issuing the contribution of signing with recession, if carried out repetition contribution content relatively, then contribution information corresponding in the heavy database and heavy original text information are looked in deletion, and delete in notice contribution production data storehouse; If do not look into heavily, directly notify contribution production data storehouse to delete;
Look into heavy server for transferring version or revision to sign the contribution of information after signing and issuing, if do not carry out repetition contribution content relatively, the contribution of transferring version or revision to sign information after signing and issuing carries out repetition contribution content relatively, the heavy original text information of determining is kept at looks into the weight database, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively, if carried out repetition contribution content relatively, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice contribution production data storehouse.
5. the method for claim 1 is characterized in that, looks into heavy server and reuses the heavy original text information of family workbench transmission to looking into, and looks into the heavy original text information of family workbench demonstration of reusing.
6. the system of an article duplicate checking is characterized in that, comprising:
Event trigger: be used for the contribution information in production data storehouse, because of the contribution on the space of a whole page is operated by after the corresponding modification, obtain amended contribution information, described contribution information comprises the contribution content;
Look into heavy server: the contribution information that is used for the contribution information of obtaining is not carried out the comparison of repetition contribution content is carried out repetition contribution content relatively, determines heavy original text information.
7. system as claimed in claim 6 is characterized in that event trigger: the status information that also is used to obtain amended contribution;
Look into heavy server: the contribution information stores that also is used for obtaining is to looking into heavy database, determine not carry out repetition contribution content contribution information relatively according to the status information of contribution, carry out repetition contribution content relatively to looking into the contribution information of not carrying out the comparison of repetition contribution content in the heavy database, the heavy original text information of determining is kept at looks into heavy database.
8. system as claimed in claim 7, it is characterized in that, event trigger: the contribution information that also is used for the production data storehouse, because of the contribution on the space of a whole page being signed and issued operation, transferring after signing and issuing version, revision to sign information and sign and issue with the label of dropping back, and by after the corresponding modification, obtain amended contribution information, described contribution information comprises the contribution content.
9. system as claimed in claim 8, it is characterized in that, look into heavy server: also be used for carrying out repetition contribution content relatively for the contribution of newly signing and issuing, the heavy original text information of determining is kept at looks into the weight database, the contribution information of the contribution of newly signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively;
For signing and issuing the contribution of signing with recession, if carried out repetition contribution content relatively, then contribution information corresponding in the heavy database and heavy original text information are looked in deletion, and delete in notice contribution production data storehouse; If do not look into heavily, directly notify contribution production data storehouse to delete;
Sign the contribution of information for transferring version or revision after signing and issuing, if do not carry out repetition contribution content relatively, the contribution of transferring version or revision to sign information after signing and issuing carries out repetition contribution content relatively, the heavy original text information of determining is kept at looks into the weight database, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice production data storehouse contribution has carried out repetition contribution content relatively, if carried out repetition contribution content relatively, the contribution information of transferring version or revision to sign the contribution of information after signing and issuing is kept at looks into heavy database, and notice contribution production data storehouse.
10. system as claimed in claim 6 is characterized in that, looks into heavy server: also be used for to looking into the heavy original text information of family workbench transmission of reusing;
Described system also comprises:
Look into and reuse the family workbench: be used to show heavy original text information.
CN2008102392929A 2008-12-08 2008-12-08 Article duplicate checking method and system Active CN101751423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102392929A CN101751423B (en) 2008-12-08 2008-12-08 Article duplicate checking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102392929A CN101751423B (en) 2008-12-08 2008-12-08 Article duplicate checking method and system

Publications (2)

Publication Number Publication Date
CN101751423A true CN101751423A (en) 2010-06-23
CN101751423B CN101751423B (en) 2012-10-31

Family

ID=42478414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102392929A Active CN101751423B (en) 2008-12-08 2008-12-08 Article duplicate checking method and system

Country Status (1)

Country Link
CN (1) CN101751423B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117293A (en) * 2009-12-30 2011-07-06 中国银联股份有限公司 Dynamic file positioning and query method
CN102156703A (en) * 2011-01-24 2011-08-17 南开大学 Low-power consumption high-performance repeating data deleting system
CN104182395A (en) * 2013-05-21 2014-12-03 北大方正集团有限公司 Quality inspection device and method for digital periodicals
CN108595439A (en) * 2018-05-04 2018-09-28 北京中科闻歌科技股份有限公司 A kind of character spread path analysis method and system
CN113064919A (en) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117293A (en) * 2009-12-30 2011-07-06 中国银联股份有限公司 Dynamic file positioning and query method
CN102117293B (en) * 2009-12-30 2014-03-19 中国银联股份有限公司 Dynamic file positioning and query method
CN102156703A (en) * 2011-01-24 2011-08-17 南开大学 Low-power consumption high-performance repeating data deleting system
CN104182395A (en) * 2013-05-21 2014-12-03 北大方正集团有限公司 Quality inspection device and method for digital periodicals
CN104182395B (en) * 2013-05-21 2017-07-07 北大方正集团有限公司 The quality inspection device and quality detecting method of digital journals
CN108595439A (en) * 2018-05-04 2018-09-28 北京中科闻歌科技股份有限公司 A kind of character spread path analysis method and system
CN108595439B (en) * 2018-05-04 2022-04-12 北京中科闻歌科技股份有限公司 Method and system for analyzing character propagation path
CN113064919A (en) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium
CN113064919B (en) * 2021-03-31 2022-11-22 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium

Also Published As

Publication number Publication date
CN101751423B (en) 2012-10-31

Similar Documents

Publication Publication Date Title
CN102640151B (en) Transformed data recording method and system
CN103678556B (en) The method and processing equipment of columnar database processing
US20050086256A1 (en) Data structure and management system for a superset of relational databases
US8321390B2 (en) Methods and apparatus for organizing data in a database
CN100530187C (en) Method for converting search inquiry into inquiry statement
CN101751423B (en) Article duplicate checking method and system
CN106933836B (en) Data storage method and system based on sub-tables
CN102509012A (en) Method for mapping contents of electronic medical record into electronic medical record standard database
CN102402596A (en) Reading and writing method and system of master slave separation database
CN110990529B (en) Industry detail dividing method and system for enterprises
US20080306788A1 (en) Spen Data Clustering Engine With Outlier Detection
CN106021207A (en) A patent writing system and method
CN108829746A (en) A kind of master data management system and device of database based on memory
CN101702219A (en) Method for generating material information and device thereof
CN110276059A (en) A kind for the treatment of method and apparatus of dynamic statement
CN107526746A (en) The method and apparatus of management document index
US7908243B2 (en) Considering transient data also in reports generated based on data eventually stored in a data-warehouse
CN112328600A (en) Electronic coupon management method
CA2856652C (en) Method and system for data filing systems
JP4189332B2 (en) Database management system, database management method, database registration request program, and database management program
CN113672618A (en) Metadata table-based multi-tenant data processing method and device
EP1687741A1 (en) Data structure and management system for a superset of relational databases
US11151178B2 (en) Self-adapting resource aware phrase indexes
CN109684331A (en) A kind of object storage meta data management device and method based on Kudu
CN102597969A (en) Database management device using key-value store with attributes, and key-value-store structure caching-device therefor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220620

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Beijing Beida Founder Electronics Co., Ltd.

Address before: 100871, Haidian District Fangzheng Road, Beijing, Zhongguancun Fangzheng building, 298, 513

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Beijing Beida Founder Electronics Co., Ltd.