CN108108596A - A kind of method and apparatus for the digital finger-print for being used to generate writing - Google Patents

A kind of method and apparatus for the digital finger-print for being used to generate writing Download PDF

Info

Publication number
CN108108596A
CN108108596A CN201711329111.7A CN201711329111A CN108108596A CN 108108596 A CN108108596 A CN 108108596A CN 201711329111 A CN201711329111 A CN 201711329111A CN 108108596 A CN108108596 A CN 108108596A
Authority
CN
China
Prior art keywords
unit
works
article unit
information
make
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711329111.7A
Other languages
Chinese (zh)
Other versions
CN108108596B (en
Inventor
童小林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhangmen Science and Technology Co Ltd
Original Assignee
Shanghai Zhangmen Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhangmen Science and Technology Co Ltd filed Critical Shanghai Zhangmen Science and Technology Co Ltd
Priority to CN201711329111.7A priority Critical patent/CN108108596B/en
Publication of CN108108596A publication Critical patent/CN108108596A/en
Application granted granted Critical
Publication of CN108108596B publication Critical patent/CN108108596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking

Abstract

The purpose of the application be to provide it is a kind of for generate writing digital finger-print method and apparatus, equipment extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;Equipment is made to determine multiple preferably to make article unit in article unit from the multiple;Equipment generates the digital finger-print of the target text works according to the text information preferably made in article unit, and strong support is provided for applications such as follow-up pirate detections.Compared with prior art, the accuracy of copyright identification is improved so that piracy detection and copyright management are faster.

Description

A kind of method and apparatus for the digital finger-print for being used to generate writing
Technical field
This application involves the communications field more particularly to a kind of technologies for the digital finger-print for being used to generate writing.
Background technology
With the development of the times, the copyright propagated in network is more and more, and digital copyright protecting side also increasingly obtains To concern.Although digital watermark technology is widely used, but due to lacking generality and principle, can not carry out comprehensively Test and measurement, watermark in addition is fully solved not yet for the proof problem of ownership, so there is an urgent need for a kind of accuracy more Height, the method protected using more efficiently digital copyright.
The content of the invention
The purpose of the application is to provide a kind of method and apparatus for the digital finger-print for being used to generate writing.
According to the one side of the application, a kind of method for the digital finger-print for being used to generate writing is provided, it should Method includes:Extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;From institute It states and multiple makees to determine multiple preferably to make article unit in article unit;According to the text information generation preferably made in article unit The digital finger-print of target text works.
According to the one side of the application, a kind of equipment for the digital finger-print for being used to generate writing is provided, this sets It is standby to include processor;And the memory of storage computer executable instructions is arranged to, the executable instruction is being performed When perform the processor:Extracted from target text works it is multiple make article unit, wherein, each make article unit include word Information;Make to determine multiple preferably to make article unit in article unit from the multiple;Believed according to the word preferably made in article unit Breath generates the digital finger-print of the target text works.
According to the one side of the application, a kind of computer-readable medium including instructing is provided, described instruction is in quilt System is caused to carry out during execution:Extracted from target text works it is multiple make article unit, wherein, each make article unit include word Information;Make to determine multiple preferably to make article unit in article unit from the multiple;Believed according to the word preferably made in article unit Breath generates the digital finger-print of the target text works.
Compared with prior art, the application is intended to by carrying out preferred process to multiple article units of making in writing, only Retain and make article unit with what the writing was closely related, and according to the text information made in article unit after preferably, generation should The digital finger-print of writing, to support further copyright identification, pirate detection and copyright management etc..Improve number The accuracy of works identification so that piracy detection and copyright management are faster.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is shown according to a kind of for generating the method flow of the digital finger-print of writing of the application some embodiments Figure;
Fig. 2 is shown according to a kind of for generating the Part Methods of the digital finger-print of writing of the application some embodiments Flow chart.
The same or similar reference numeral represents the same or similar component in attached drawing.
Specific embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more Processor (CPU), input/output interface, network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only memory (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, magnetic tape disk storage or other magnetic storage apparatus or Any other non-transmission medium, the information that can be accessed by a computing device available for storage.
The application meaning equipment includes but not limited to user equipment, the network equipment or user equipment and the network equipment passes through Network is integrated formed equipment.The user equipment, which includes but not limited to any one, to carry out human-computer interaction with user The mobile electronic product of (such as human-computer interaction is carried out by touch tablet), such as smart mobile phone, tablet computer etc., the mobile electricity Arbitrary operating system, such as android operating systems, iOS operating systems may be employed in sub- product.Wherein, the network equipment Including it is a kind of can be according to the instruction for being previously set or storing, the automatic electronic equipment for carrying out numerical computations and information processing, Hardware includes but not limited to microprocessor, application-specific integrated circuit (ASIC), programmable logic device (PLD), field programmable gate Array (FPGA), digital signal processor (DSP), embedded device etc..The network equipment includes but not limited to computer, net The cloud that network host, single network server, multiple network server collection or multiple servers are formed;Here, cloud is by being based on cloud meter The a large amount of computers or network server for calculating (Cloud Computing) are formed, wherein, cloud computing is the one of Distributed Calculation Kind, a virtual supercomputer being made of the computer collection of a group loose couplings.The network includes but not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), LAN, VPN network, wireless self-organization network (Ad Hoc networks) etc..Preferably, the equipment Can also be run on the user equipment, the network equipment or user equipment and the network equipment, the network equipment, touch terminal or The network equipment is integrated the program in formed equipment with touch terminal by network.
Certainly, those skilled in the art will be understood that above equipment is only for example, other are existing or are likely to occur from now on Equipment be such as applicable to the application, should also be included within the application protection domain, and be incorporated herein by reference herein.
In the description of the present application, " multiple " are meant that two or more, unless otherwise specifically defined.
Fig. 1 shows a kind of method flow of digital finger-print for being used to generate writing of some embodiments of the application Figure, the method comprising the steps of S11, step S12 and step S13.Wherein, in step s 11, equipment is carried from target text works Take it is multiple make article unit, wherein, each make article unit include text information;In step s 12, equipment is from the multiple works list It determines multiple preferably to make article unit in member;In step s 13, equipment is generated according to the text information preferably made in article unit The digital finger-print of the target text works.Wherein, target text works will generate the digital finger-print of the writing.We Case can perform completion by the network equipment, can also perform completion by user equipment.Here, for the sake of simplicity, the application will be with Corresponding embodiment is illustrated exemplified by the network equipment;Those skilled in the art will be understood that in addition to the embodiment being explicitly illustrated individually, be somebody's turn to do Etc. embodiments can equally be performed by user equipment.
Specifically, in step s 11, the network equipment extracted from target text works it is multiple make article unit, wherein, each Making article unit includes text information.In some embodiments, extraction is made the mode of article unit and is included but not limited to:According to target text The structure division of word works in itself, is obtained for example, division can be carried out according to the catalogue of target text works, each several part, each chapters and sections etc. It must make article unit;Or divided according to the number of words of target text works, such as every a K word in target text works It wins several words and makees article unit.For example, to target text works《Xx travel notes》" chapter 1 is obtained according to each chapter construction Get ready for a journey ", " chapter 2 arrives safely ", multiple chapters and sections such as " chapter 3 tastes most cuisines ", therefrom " chapter 3 tastes most U.S. for extraction The chapters and sections such as food ", " the 19th chapter is picturesque ", " the 20th chapter tourism gains in depth of comprehension " are to make article unit.In some embodiments, word Information includes but not limited to the corresponding first letter of pinyin string of text strings, word or word string length (i.e. word quantity) etc..
In step s 12, the network equipment is made to determine multiple preferably to make article unit in article unit from the multiple.It is here, excellent Being elected to be article unit includes obtaining and the target text works are more relevant makees article unit or more suitable from the screening of multiple works units Make article unit together in generation digital finger-print.After upper example, " chapter 3 tastes most cuisines ", " the 19th chapter landscape show from extraction It is beautiful ", " the 20th chapter travel gains in depth of comprehension " etc. make to screen in article unit, finally definite " the 19th chapter is picturesque ", " the 20th chapter trip Trip gains in depth of comprehension " are preferably to make article unit.
In step s 13, the network equipment generates the target text according to the text information preferably made in article unit The digital finger-print of works.In some embodiments, from preferably making to extract part or its whole in article unit, and by its corresponding text Word information generates the digital finger-print of the target text works.Such as the example above, the text information for preferably making article unit is extracted:Word String " carries the craftwork of national characters, beauty very much, to allow people too plenty for the eye to take it all in simply there are many it!And most attracting is this wind The night scene in the graceful peaceful simple and unsophisticated small city of scape, allows people intoxicated simply!", the corresponding first letter of pinyin string of the text strings is " TYXDDYMZTSDGYPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ ", word string length are " 53 ";Finally The digital finger-print for generating the target text works is " TYXDDYMZTSDGYPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDY JJZRRTZ53”。
In some embodiments, this method further includes step S14 (not shown), and in step S14, the network equipment is according to institute The digital finger-print for stating target text works carries out matching inquiry in fingerprint database, makees condition with the target text to obtain Matched writing;Alternatively, refer to by comparing the number of the digital finger-print and reference word works of the target text works Line determines whether the target text works and the reference word works are same or similar.Here, fingerprint database stores The digital finger-print of various writings, wherein, various writings include but not limited to classical writing, books and periodicals periodical is made Product, scientific and technological writing or news media works;Reference word works include but not limited to classical writing, books and periodicals periodical is made Product, scientific and technological writing or news media works.
In some embodiments, as above example, the network equipment is according to the digital finger-print " TYXDDYMZTSDG of target text works YPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ53 " carries out matching inquiry in fingerprint database, at some In embodiment, it is identical with the digital finger-print of target text works such as to obtain another digital finger-print, and then obtains the digital finger-print That is, there is the writing to match with target text works in corresponding writing.Further by verifying that two words are made The authorization conditions of product judge whether one of them is piracy etc..
In some embodiments, as above example, the digital finger-print of target text works is " TYXDDYMZTSDGYPMLJLJZRR MBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ53 ", the digital finger-print of reference word works is " TYXDDYMZTSDGYPMLJ LJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRTYBR54 " in some embodiments, refers to by comparing the two numbers Line show that the similarity of character string of two digital finger-prints, such as similarity of character string 91% are more than preset number fingerprint character String similarity 75%, therefore determine that two digital finger-print is similar.Further by verifying the authorization conditions of two writings, it is judged One of whether be piracy etc..Wherein, preset number fingerprint similarity of character string 75% includes being set by the network equipment and generate. This, those skilled in the art can determine the similar of two character strings by modes such as Jie Kade distances or smallest edit distances Degree.
Fig. 2 is shown according to a kind of for generating the Part Methods of the digital finger-print of writing of the application some embodiments Flow chart.Step S12 in this programme includes sub-step S121 and sub-step S122.In sub-step S121, the network equipment leads to It crosses and matching inquiry is carried out in cell data storehouse according to the article unit of making, determine described to make article unit and the cell data storehouse The maximum matching degree information of middle works;In sub-step S122, the network equipment is according to the maximum matching degree information from described more It is a to make to determine multiple preferably to make article unit in article unit.For example, the network equipment by such article unit of making corresponding maximum by its Ascending order arrangement is carried out with degree information, and top n is taken to make article unit conduct and preferably makees article unit;It is or maximum matching degree information is low Make article unit as preferably making article unit in or equal to predetermined matching degree threshold value.In some embodiments, matching degree information The ratio between the intersection for the text strings that can make article unit for two and the size of union.Here, cell data stock puts common word list Member, including but not limited to classical phrase, sentence, paragraph or common saying, such as " Putting aside one matter, let's talk about the other ", " for follow-up detailed Feelings are analysed and explained below " etc..
For example, to target text works《Xx travel notes》Extract " chapter 3 tastes most cuisines ", " the 19th chapter is picturesque ", " the 20th chapter tourism gains in depth of comprehension " etc. respectively make article unit, carry out matching inquiry in cell data storehouse respectively, determine the works list First maximum matching degree information with works in the cell data storehouse.For example, the text strings content to " chapter 3 tastes most cuisines " " candied haws on a stick is glowing, does not know to attract the sight of how many people " carries out matching inquiry inquiry in cell data storehouse and obtains the work The matching degree information of article unit and works a, b, c, d, e in the database is respectively 0.00,0.37,0.58,0.64,0.23, then It is 0.64 that this, which makees article unit and the maximum matching degree information of works in cell data storehouse,.
To the text strings content of " the 19th chapter is picturesque " " spring rain and spring thunder, the continues patter of raindrops waft " in cell data Matching inquiry is carried out in storehouse, obtain the matching degree information for making works f, g, h in article unit and the database be respectively 0.20, 0.49th, 0.58, then it is 0.58 that this, which makees article unit and the maximum matching degree information of works in cell data storehouse,.To " the 20th chapter trip " this place, one absolutely for the text strings content of trip gains in depth of comprehension "!" matching inquiry is carried out in cell data storehouse, it obtains this and makees article unit with being somebody's turn to do The matching degree information of works i is 0.5 in database, then the maximum matching degree information for making article unit and works in cell data storehouse For 0.5.
In some embodiments, as above example makees the maximum matching degree information 0.64,0.58,0.5 of article unit according to these, with Default maximum matching degree threshold value 0.6 is compared, and it is corresponding more than default maximum matching degree threshold value to reject maximum matching degree information Make article unit, only retain and remaining make article unit and be determined as preferably making article unit.It is final to determine that maximum matching degree information is " the 20th chapter tourism gains in depth of comprehension " corresponding works that 0.58 " the 19th chapter is picturesque " and maximum matching degree information are 0.5 Unit, which is used as, preferably makees article unit.
In some embodiments, this programme sub-step S121 includes:The network equipment as article unit by dividing described Word processing determines described to make the corresponding unit description information of article unit;According to the unit description information in cell data storehouse into Row matching inquiry determines that the maximum matching degree of the unit description information and unit description information in the cell data storehouse is believed Breath, and as the maximum matching degree information for making article unit and works in the cell data storehouse.Here, unit description information Include but not limited to the frequency for the keyword and its appearance for making article unit.In some embodiments, between unit description information It is the ratio between size of the intersection of keyword text string and union in unit description information with degree information, wherein frequency of occurrence is more than 1 Keyword text string will repeat statistics calculate.Here, cell data stock puts common text unit and its unit description letter Breath, wherein the common text unit stored includes but not limited to classical phrase, sentence, paragraph or common saying, such as " the flowers are in blossom two, Each one, table ", " for follow-up details, being analysed and explained below " etc..
For example, it " looks forward to, looks forward to, spring is coming, and the doll that spring picture has just landed, spring is as sister-in-law to making article unit Ma, spring is as vigorous youth." carry out word segmentation processing acquisition " look forward to that/wearing/is looked forward to// spring/and come// spring/as/ Just/fall/ground// doll/spring/as/small/Miss/spring/as/vibrant// young ", determine that this makees article unit correspondence Unit description information:Keyword " looks forward to ", frequency of occurrence 2;Keyword " spring ", frequency of occurrence 4.Then, according to this Unit description information carries out matching inquiry in cell data storehouse, determines in the unit description information and the cell data storehouse The maximum matching degree information of unit description information.Assuming that the unit description information carries out matching inquiry inquiry in cell data storehouse The matching degree information for obtaining unit description information a, b, c, d, e in the unit description information and the database is respectively 0.02, 0.33rd, 0.59,0.67,0.26, then the maximum matching degree of unit description information is believed in the unit description information and cell data storehouse It ceases for 0.67.Then, which is made article unit and works in the cell data storehouse as described Maximum matching degree information.
The calculating process of the unit description information and the matching degree information of unit description information in the cell data storehouse can With reference to lower example:Assuming that there are works in cell data storehouse, " doll that spring picture has just landed is new from foot in front, it grows It.Spring is gaudily dressed as little girl, laughs at, walks.The youth of spring picture stalwartness has cast-iron arm and waist foot, neck Us to go forward.", the corresponding unit description information of the works is:Keyword " spring ", frequency of occurrence 4;Then Unit two The matching degree information of description information is 0.67 (8/12=0.67).
In some embodiments, this programme step S12 includes:The network equipment is made to reject satisfaction in article unit from the multiple Predetermined unit rule of elimination makees article unit;Make to determine multiple preferred works in article unit from the multiple residue for making article unit Article unit.Wherein, unit rule of elimination includes but not limited to:
There is the unit to match in cell data storehouse in the article unit of making;It is retouched in the presence of with the unit for making article unit The reference unit description information of information match is stated, wherein, the unit description information for making article unit is by the work Article unit carries out what word segmentation processing determined;The word quantity for making article unit is less than unit word amount threshold.Wherein, unit Description information includes but not limited to the frequency for the keyword and its appearance for making article unit.Reference unit description information, in some realities It applies in example, reference unit description information is obtained by common text unit processing.In some embodiments, matching degree information is key The ratio between the intersection of word text strings and the size of union, wherein, keyword text string of the frequency of occurrence more than 1 will repeat statistics meter It calculates.
For example, make article unit " spring is the season being full of vitality ", " summer is a season for green ", " autumn is The season of a harvest " and " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep ", in some embodiments, unit rule of elimination bag It includes the article unit of making and there is the unit that matches in cell data storehouse, such as make article unit " this winter three layers of quilt of wheat lid, coming year pillow Steamed bun to sleep " there is common text unit " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep " and the work in cell data storehouse Article unit is identical, and matching degree information is that 1 and Unit two match;Making to reject this in article unit to make article unit " modern from upper example Three layers of quilt of winter wheat lid, the coming year rest the head on steamed bun and sleep ", and make article unit " spring is the season being full of vitality ", " summer from remaining It is a season for green " with determining multiple preferably to make article unit in " autumn is a season for harvest ".In some embodiments, Unit rule of elimination includes the reference unit description information for having with the unit description information relevant matches for making article unit, In, the unit description information for making article unit is by determining the article unit progress word segmentation processing of making.Such as works list A member " autumn is a season for harvest ", to this make article unit carry out word segmentation processing for " autumn/be// harvest// season " really Determine the unit description information for making article unit:Keyword " harvest ", frequency of occurrence 1;The reference for obtaining common text unit is single First description information:Keyword " harvest ", frequency of occurrence 1 have what is matched with the unit description information for making article unit Reference unit description information then weeds out this and makees article unit, and making article unit from remaining 3, " spring is a is full of vitality Season ", " summer is a season for green " in " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep " with determining multiple preferred works Unit.In some embodiments, unit rule of elimination includes the word quantity for making article unit and is less than unit word quantity threshold Value.For example, unit word amount threshold is 2, it is " good to make article unit!" word quantity for 1 less than unit word amount threshold 2, Generally can not reflect the feature of the works well, then weed out this and make article unit, from it is remaining make in article unit determine it is multiple It is preferred that make article unit.Wherein, unit word amount threshold 2 includes calculating generation by counting by the network equipment.
Those skilled in the art will be understood that the content of said units rule of elimination is only for example, existing or from now on The other unit rule of eliminations being likely to occur, such as the combination of Yi Shang unit rule of elimination, being such as applicable to the application should all wrap It is contained in the protection domain of the application, and is incorporated herein by reference herein.
In some embodiments, in step S13, the network equipment extracts institute from the text information preferably made in article unit It states and preferably makees the corresponding unit character features information of article unit;The preferred works are generated according to the unit character features information The unit fingerprint of unit;The digital finger-print of the target text works is generated according to the unit fingerprint.Here, unit word is special Reference breath includes but not limited to corresponding text strings, the corresponding first letter of pinyin string of word or the word quantity information made in article unit.
For example, it is preferable to make article unit text strings " at noon, The hot sun is high in the sky ", the corresponding first letter of pinyin string of word is " ZWLRDK ", word quantity are 6;It is preferred that making article unit text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string of word is " BWXYXX ", word quantity are 6.In some embodiments, part is extracted from the text information for preferably making article unit, as excellent It is elected to be the corresponding unit character features information of article unit.For example, make article unit text strings " at noon, The hot sun is high in the sky ", text from preferred The corresponding first letter of pinyin string " ZWLRDK " of word, word quantity are 6, extract text strings " The hot sun is high in the sky ", the corresponding phonetic lead-in of word Letter string " LRDK ", word quantity are 4, preferably make the corresponding unit character features information of article unit as this;From preferred works list First text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string " BWXYXX " of word, word quantity are 6, extract text strings The corresponding first letter of pinyin string " BW " of " dusk ", word, word quantity are 2, preferably make the unit character features of article unit as this Information.According to said units character features information:The corresponding first letter of pinyin string " LRDK " of text strings " The hot sun is high in the sky ", word, text Number of words is 4 first letter of pinyin strings " BW " corresponding with text strings " dusk " word, word quantity is 2, and generation is corresponding preferred respectively Make the unit fingerprint of article unit:“LRDK4”、“BW2”.The number that target text works are finally generated according to the fingerprint of Unit 2 refers to Line, in some embodiments, the digital finger-print of unit fingerprint generation target text works include carrying out the corresponding cumulative, letter of number String combines successively, generates the digital finger-print of target text works.As unit fingerprint " LRDK4 " and " BW2 " generation target text are made The digital finger-print of product is " LRDKBW6 ".In further embodiments, the digital finger-print bag of unit fingerprint generation target text works Include the digital finger-print of alphabetic string and digital combination producing target text works successively.For example, unit fingerprint " LRDK4 " and " PW2 " The digital finger-print for generating target text works is " LRDK4BW2 ".
In some embodiments, the unit character features information includes one or more continuous text strings, each continuous The length of text strings is equal to or more than predetermined text strings length threshold.In some embodiments, the text strings length threshold Including but not limited to:The works mean sentence length information of the target text works;The author of the target text works is corresponding Author's mean sentence length information.In some embodiments, it is discontinuous between multiple continuous text strings, by other words or mark space It opens to distinguish other continuous text strings.Wherein, the quantity of length, that is, word of text strings, sentence long message are every word Word quantity.Works mean sentence length information in some embodiments, is gone forward side by side by the sentence long message for counting each sentence word of the works Row summation, then distich long message calculate average value to obtain works mean sentence length information.Author's mean sentence length information, in some realities It applies in example, the works mean sentence length information of author's oeuvre is calculated by counting and sums, then be averaged to works Sentence long message calculates average value to obtain author's mean sentence length information.
For example, it is preferable to make the corresponding unit character features information of article unit including " sky in autumn is most beautiful, sky color As sea is so azure, the sun, which rises, to be come ", " autumn dark blue so beautiful, cloudless, the almost withered leaf in day, I gets drunk In this autumn scenery " with " at the night in autumn, the moon hangs over day to seem especially bright in the air, and mountain range, which is all seen, to be perfectly clear " three companies Continuous text strings, each text strings are continuous, discontinuous between continuous text strings two-by-two.Three words of the unit character features information Corresponding word string length of going here and there be respectively word quantity be 23, word quantity is 29, word quantity is 27;By with predetermined text Word length threshold value 12 compares, and the length for obtaining three text strings is all higher than predetermined text strings length threshold 12.Wherein, really Fixed predetermined text strings length threshold 12 includes:It is long by the sentence for counting each sentence word of the target text works in some embodiments Information, such as " 3,12,15,14,11,6,13,14,10,2 " simultaneously carry out summation as 100, and it is 10 that distich long message, which calculates average value, The works mean sentence length information for obtaining the target text works is 10, further according to most sentence long letters in the target text works Breath is more than 10, and the length threshold for presetting text strings is 1.2 times of the works mean sentence length information of target text works, then finally really Text strings length threshold is determined for 12.Or in some embodiments, by counting the corresponding authors of author of the target text works Mean sentence length information determines text strings length threshold.For example, the author of the target text works is Lee three, to each portion of Lee three Writing counts respectively, obtains the works mean sentence length information of every writing, such as obtains the work of " writing " Product mean sentence length information is 14, and the works mean sentence length information of " two writings " is 9, and the works of " three writings " are put down Equal sentence long message is 13;Summation is carried out to the sentence long message of three writings of three author of Lee as 36, further to this three It is 12 that the works mean sentence length information of portion's works, which calculates average value, and it is long finally to preset text strings according to the characteristic style of works of the author It is 12 to spend threshold value.
Those skilled in the art will be understood that the mode of above-mentioned definite text strings length threshold is only for example, it is existing or The other manner for the definite text strings length threshold that person will be likely to occur from now on, the application should be all included in by being such as applicable to the application Protection domain in, and be incorporated herein by reference herein.
In some embodiments, this programme step S13 includes:The network equipment preferably makees text in article unit from the multiple The works character features information of the target text works is extracted in word information;Institute is generated according to the works character features information State the digital finger-print of target text works.
For example, it is preferable to make article unit text strings " at noon, The hot sun is high in the sky ", the corresponding first letter of pinyin string of word is " ZWLRDK ", word quantity are 6;It is preferred that making article unit text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string of word is " BWXYXX ", word quantity are 6.The text information for preferably making article unit from above-mentioned two extracts part, and " burning sun is worked as acquisition respectively Sky " text strings, the corresponding first letter of pinyin string of word are " LRDK ", word quantity is 4 corresponding with " the sundowners " text strings, word First letter of pinyin string is " XYXX ", word quantity is 4.In some embodiments, the word in the text information that will extract is passed through String, the corresponding first letter of pinyin string of word combine successively, and word quantity is added up to obtain the works word of the target text works Characteristic information, for example, finally obtain the works character features information of the target text works, " The hot sun is high in the sky the sundowners " word String, the corresponding first letter of pinyin string of word are " LRDK XYXX ", word quantity is 8;According to the character features of the target text works The digital finger-print that information generates the target text works is " LRDK XYXX8 ".
Those skilled in the art will be understood that the mode of above-mentioned generation target text works digital finger-print is only for example, existing The other manner of generation target text works digital finger-print that is having or being likely to occur from now on is such as applicable to the application and all should In the protection domain of the application, and it is incorporated herein by reference herein.
Present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has calculating Machine code, when the computer code is performed, such as preceding any one of them method is performed.
Present invention also provides a kind of computer program product, when the computer program product is performed by computer equipment When, such as preceding any one of them method is performed.
Present invention also provides a kind of computer equipment, the computer equipment includes:
One or more processors;
Memory, for storing one or more computer programs;
When one or more of computer programs are performed by one or more of processors so that it is one or Multiple processors realize such as preceding any one of them method.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt With application-specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, the software program of the application can perform to realize steps described above or function by processor.Similarly, the application Software program can be stored in computer readable recording medium storing program for performing (including relevant data structure), for example, RAM memory, Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example Such as, as the circuit for coordinating to perform each step or function with processor.
In addition, the part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer performs, by the operation of the computer, it can call or provide according to the present processes and/or technical solution. Those skilled in the art will be understood that existence form of the computer program instructions in computer-readable medium includes but not limited to Source file, executable file, installation package file etc., correspondingly, the mode that computer program instructions are computer-executed include but It is not limited to:The computer directly perform the instruction or the computer compile the instruction after perform program after corresponding compiling again, Either the computer reads and performs the instruction or after the computer reads and install and perform corresponding installation again after the instruction Program.Here, computer-readable medium can be for computer access arbitrary available computer readable storage medium or Communication media.
Communication media includes thereby including such as computer-readable instruction, data structure, program module or other data Signal of communication is transmitted to the medium of another system from a system.Communication media may include there is transmission medium (such as electricity led Cable and line (for example, optical fiber, coaxial etc.)) and can propagate wireless (not having the transmission the led) medium of energy wave, such as sound, electricity Magnetic, RF, microwave and infrared.Computer-readable instruction, data structure, program module or other data can be embodied as example wireless Medium (such as carrier wave or be such as embodied as spread spectrum technique a part similar mechanism) in modulated message signal. Term " modulated message signal " refers to that one or more feature is modified or is set in a manner of coding information in the signal Fixed signal.Modulation can be simulation, digital or Hybrid Modulation Technology.
As an example, not a limit, computer readable storage medium may include to store such as computer-readable finger Make, the volatile and non-volatile that any method or technique of the information of data structure, program module or other data is realized, can Mobile and immovable medium.For example, computer readable storage medium includes, but not limited to volatile memory, such as with Machine memory (RAM, DRAM, SRAM);And nonvolatile memory, such as flash memory, various read-only memory (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, FeRAM);And magnetic and optical storage apparatus (hard disk, Tape, CD, DVD);Or other currently known media or Future Development can store the computer used for computer system Readable information/data.
Here, including a device according to one embodiment of the application, which includes storing computer program The memory of instruction and the processor for executing program instructions, wherein, when the computer program instructions are performed by the processor When, trigger methods and/or techniques scheme of the device operation based on foregoing multiple embodiments according to the application.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie In the case of without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power Profit requirement rather than above description limit, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation is included in the application.Any reference numeral in claim should not be considered as to the involved claim of limitation.This Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade words are used for table Show title, and do not represent any particular order.

Claims (12)

1. it is a kind of for generating the method for the digital finger-print of writing, wherein, this method includes:
Extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;
Make to determine multiple preferably to make article unit in article unit from the multiple;
The digital finger-print of the target text works is generated according to the text information preferably made in article unit.
2. according to the method described in claim 1, wherein, the method further includes:
Matching inquiry is carried out in fingerprint database according to the digital finger-print of the target text works, to obtain and the target The writing that writing matches;Alternatively,
By comparing the digital finger-print of the digital finger-print and reference word works of the target text works, the target text is determined Whether word works and the reference word works are same or similar.
It is 3. described to make to determine multiple preferred works lists in article unit from the multiple according to the method described in claim 1, wherein Member includes:
By carrying out matching inquiry in cell data storehouse according to the article unit of making, determine described to make article unit and the unit The maximum matching degree information of works in database;
Make to determine in article unit multiple preferably to make article unit from the multiple according to the maximum matching degree information.
It is described by making article unit according to described in cell data storehouse and carrying out 4. according to the method described in claim 3, wherein Matching inquiry determines that the maximum matching degree information for making article unit and works in the cell data storehouse includes:
By making article unit and carrying out word segmentation processing to determine described to make the corresponding unit description information of article unit to described;
Matching inquiry is carried out in cell data storehouse according to the unit description information, determine the unit description information with it is described The maximum matching degree information of unit description information in cell data storehouse, and make as described in article unit and the cell data storehouse The maximum matching degree information of works.
It is 5. described to make to determine multiple preferred works lists in article unit from the multiple according to the method described in claim 1, wherein Member includes:
From it is the multiple make to reject in article unit meet predetermined unit rule of elimination make article unit;
Make to determine multiple preferably to make article unit in article unit from the multiple residue for making article unit.
6. according to the method described in claim 5, wherein, the unit rule of elimination includes following at least any one:
There is the unit to match in cell data storehouse in the article unit of making;
In the presence of the reference unit description information to match with the unit description information for making article unit, wherein, the works list The unit description information of member is by determining the article unit progress word segmentation processing of making;
The word quantity for making article unit is less than unit word amount threshold.
It is 7. described that institute is generated according to the text information preferably made in article unit according to the method described in claim 1, wherein Stating the digital finger-print of target text works includes:
Preferably make the corresponding unit character features information of article unit from the text information extraction preferably made in article unit is described;
According to the unit character features information generation unit fingerprint for preferably making article unit;
The digital finger-print of the target text works is generated according to the unit fingerprint.
8. according to the method described in claim 7, wherein, the unit character features information includes one or more continuous words String, the length of each continuous text strings are equal to or more than predetermined text strings length threshold.
9. according to the method described in claim 8, wherein, the text strings length threshold is determined based on any one of following:
The works mean sentence length information of the target text works;
The corresponding author's mean sentence length information of author of the target text works.
It is 10. described to be generated according to the text information preferably made in article unit according to the method described in claim 1, wherein The digital finger-print of the target text works includes:
The works character features letter of the target text works is extracted from the multiple text information preferably made in article unit Breath;
The digital finger-print of the target text works is generated according to the works character features information.
11. it is a kind of for generating the equipment of the digital finger-print of writing, wherein, which includes:
Processor;And
The memory of storage computer executable instructions is arranged to, the executable instruction makes the processor when executed Perform the operation as any one of claims 1 to 10.
12. a kind of computer-readable medium including instructing, described instruction cause system to carry out such as claim 1 when executed To the operation any one of 10.
CN201711329111.7A 2017-12-13 2017-12-13 Method and equipment for generating digital fingerprints of written works Active CN108108596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711329111.7A CN108108596B (en) 2017-12-13 2017-12-13 Method and equipment for generating digital fingerprints of written works

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711329111.7A CN108108596B (en) 2017-12-13 2017-12-13 Method and equipment for generating digital fingerprints of written works

Publications (2)

Publication Number Publication Date
CN108108596A true CN108108596A (en) 2018-06-01
CN108108596B CN108108596B (en) 2020-12-01

Family

ID=62215797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711329111.7A Active CN108108596B (en) 2017-12-13 2017-12-13 Method and equipment for generating digital fingerprints of written works

Country Status (1)

Country Link
CN (1) CN108108596B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664900A (en) * 2018-04-20 2018-10-16 上海掌门科技有限公司 A kind of method and apparatus of the similarities and differences of writing for identification
CN109345416A (en) * 2018-09-12 2019-02-15 连尚(新昌)网络科技有限公司 It is a kind of for recording the method and apparatus of the adduction relationship between works

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315622A (en) * 2007-05-30 2008-12-03 香港中文大学 System and method for detecting file similarity
CN101976318A (en) * 2010-11-15 2011-02-16 北京理工大学 Detection method of code similarity based on digital fingerprints
CN102542183A (en) * 2010-12-17 2012-07-04 盛乐信息技术(上海)有限公司 Method and system for detecting copyright of network literature
CN102855424A (en) * 2011-06-29 2013-01-02 盛乐信息技术(上海)有限公司 Digital fingerprint extraction method and device and literary works identification method and device
CN102855423A (en) * 2011-06-29 2013-01-02 盛乐信息技术(上海)有限公司 Tracking method and device of literary works
US8799236B1 (en) * 2012-06-15 2014-08-05 Amazon Technologies, Inc. Detecting duplicated content among digital items
CN104679728A (en) * 2015-02-06 2015-06-03 中国农业大学 Text similarity detection device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315622A (en) * 2007-05-30 2008-12-03 香港中文大学 System and method for detecting file similarity
CN101976318A (en) * 2010-11-15 2011-02-16 北京理工大学 Detection method of code similarity based on digital fingerprints
CN102542183A (en) * 2010-12-17 2012-07-04 盛乐信息技术(上海)有限公司 Method and system for detecting copyright of network literature
CN102855424A (en) * 2011-06-29 2013-01-02 盛乐信息技术(上海)有限公司 Digital fingerprint extraction method and device and literary works identification method and device
CN102855423A (en) * 2011-06-29 2013-01-02 盛乐信息技术(上海)有限公司 Tracking method and device of literary works
US8799236B1 (en) * 2012-06-15 2014-08-05 Amazon Technologies, Inc. Detecting duplicated content among digital items
CN104679728A (en) * 2015-02-06 2015-06-03 中国农业大学 Text similarity detection device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
史彦军等: "抄袭论文识别研究与进展", 《大连理工大学学报》 *
类艳春: "基于篇章结构的抄袭论文识别系统的研究与实现", 《CNK中国优秀硕士学位论文全文数据库信息科技辑》 *
董卫博: "中文文档复制检测系统的研究与实现", 《CNK中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664900A (en) * 2018-04-20 2018-10-16 上海掌门科技有限公司 A kind of method and apparatus of the similarities and differences of writing for identification
CN109345416A (en) * 2018-09-12 2019-02-15 连尚(新昌)网络科技有限公司 It is a kind of for recording the method and apparatus of the adduction relationship between works
CN109345416B (en) * 2018-09-12 2021-09-21 连尚(新昌)网络科技有限公司 Method and equipment for recording reference relation between works

Also Published As

Publication number Publication date
CN108108596B (en) 2020-12-01

Similar Documents

Publication Publication Date Title
Gaveau et al. Slowing deforestation in Indonesia follows declining oil palm expansion and lower oil prices
KR101855597B1 (en) Systems and methods for video paragraph captioning using hierarchical recurrent neural networks
CN107707545A (en) A kind of abnormal web page access fragment detection method, device, equipment and storage medium
CN105488092A (en) Time-sensitive self-adaptive on-line subtopic detecting method and system
CN104281565B (en) Semantic dictionary construction method and device
CN106528846A (en) Retrieval method and device
CN110287309A (en) The method of rapidly extracting text snippet
CN110032859A (en) Abnormal account's discrimination method and device and medium
CN102999638A (en) Phishing website detection method excavated based on network group
CN106030527B (en) By the system and method for application notification user available for download
He et al. Petgen: Personalized text generation attack on deep sequence embedding-based classification models
CN108108596A (en) A kind of method and apparatus for the digital finger-print for being used to generate writing
Barlow et al. A novel approach to detect phishing attacks using binary visualisation and machine learning
CN103678480A (en) Personalized image retrieval method with privacy controlled in grading mode
CN107169011A (en) The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN104462282B (en) Information search method and device
Yingjie et al. A zero-watermarking scheme for prose writings
Zhu et al. A4: Evading learning-based adblockers
CN104008333B (en) The detection method and equipment of a kind of installation kit
CN104008334B (en) The clustering method and equipment of a kind of file
CN116467710A (en) Unbalanced network-oriented malicious software detection method
CN107704732A (en) A kind of method and apparatus for being used to generate works fingerprint
WO2019019711A1 (en) Method and apparatus for publishing behaviour pattern data, terminal device and medium
Yuan et al. Utilizing related samples to learn complex queries in interactive concept-based video search
CN108664900A (en) A kind of method and apparatus of the similarities and differences of writing for identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant