CN108108596A - A kind of method and apparatus for the digital finger-print for being used to generate writing - Google Patents
A kind of method and apparatus for the digital finger-print for being used to generate writing Download PDFInfo
- Publication number
- CN108108596A CN108108596A CN201711329111.7A CN201711329111A CN108108596A CN 108108596 A CN108108596 A CN 108108596A CN 201711329111 A CN201711329111 A CN 201711329111A CN 108108596 A CN108108596 A CN 108108596A
- Authority
- CN
- China
- Prior art keywords
- unit
- works
- article unit
- information
- make
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000015654 memory Effects 0.000 claims description 24
- 238000003860 storage Methods 0.000 claims description 13
- 230000008030 elimination Effects 0.000 claims description 10
- 238000003379 elimination reaction Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 abstract description 4
- 238000004590 computer program Methods 0.000 description 10
- 230000005291 magnetic effect Effects 0.000 description 7
- 239000000284 extract Substances 0.000 description 6
- 238000003306 harvesting Methods 0.000 description 6
- 241000209140 Triticum Species 0.000 description 5
- 235000021307 Triticum Nutrition 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 235000019640 taste Nutrition 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000005611 electricity Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013475 authorization Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 229910001018 Cast iron Inorganic materials 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
Abstract
The purpose of the application be to provide it is a kind of for generate writing digital finger-print method and apparatus, equipment extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;Equipment is made to determine multiple preferably to make article unit in article unit from the multiple;Equipment generates the digital finger-print of the target text works according to the text information preferably made in article unit, and strong support is provided for applications such as follow-up pirate detections.Compared with prior art, the accuracy of copyright identification is improved so that piracy detection and copyright management are faster.
Description
Technical field
This application involves the communications field more particularly to a kind of technologies for the digital finger-print for being used to generate writing.
Background technology
With the development of the times, the copyright propagated in network is more and more, and digital copyright protecting side also increasingly obtains
To concern.Although digital watermark technology is widely used, but due to lacking generality and principle, can not carry out comprehensively
Test and measurement, watermark in addition is fully solved not yet for the proof problem of ownership, so there is an urgent need for a kind of accuracy more
Height, the method protected using more efficiently digital copyright.
The content of the invention
The purpose of the application is to provide a kind of method and apparatus for the digital finger-print for being used to generate writing.
According to the one side of the application, a kind of method for the digital finger-print for being used to generate writing is provided, it should
Method includes:Extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;From institute
It states and multiple makees to determine multiple preferably to make article unit in article unit;According to the text information generation preferably made in article unit
The digital finger-print of target text works.
According to the one side of the application, a kind of equipment for the digital finger-print for being used to generate writing is provided, this sets
It is standby to include processor;And the memory of storage computer executable instructions is arranged to, the executable instruction is being performed
When perform the processor:Extracted from target text works it is multiple make article unit, wherein, each make article unit include word
Information;Make to determine multiple preferably to make article unit in article unit from the multiple;Believed according to the word preferably made in article unit
Breath generates the digital finger-print of the target text works.
According to the one side of the application, a kind of computer-readable medium including instructing is provided, described instruction is in quilt
System is caused to carry out during execution:Extracted from target text works it is multiple make article unit, wherein, each make article unit include word
Information;Make to determine multiple preferably to make article unit in article unit from the multiple;Believed according to the word preferably made in article unit
Breath generates the digital finger-print of the target text works.
Compared with prior art, the application is intended to by carrying out preferred process to multiple article units of making in writing, only
Retain and make article unit with what the writing was closely related, and according to the text information made in article unit after preferably, generation should
The digital finger-print of writing, to support further copyright identification, pirate detection and copyright management etc..Improve number
The accuracy of works identification so that piracy detection and copyright management are faster.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is shown according to a kind of for generating the method flow of the digital finger-print of writing of the application some embodiments
Figure;
Fig. 2 is shown according to a kind of for generating the Part Methods of the digital finger-print of writing of the application some embodiments
Flow chart.
The same or similar reference numeral represents the same or similar component in attached drawing.
Specific embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more
Processor (CPU), input/output interface, network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only memory (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, magnetic tape disk storage or other magnetic storage apparatus or
Any other non-transmission medium, the information that can be accessed by a computing device available for storage.
The application meaning equipment includes but not limited to user equipment, the network equipment or user equipment and the network equipment passes through
Network is integrated formed equipment.The user equipment, which includes but not limited to any one, to carry out human-computer interaction with user
The mobile electronic product of (such as human-computer interaction is carried out by touch tablet), such as smart mobile phone, tablet computer etc., the mobile electricity
Arbitrary operating system, such as android operating systems, iOS operating systems may be employed in sub- product.Wherein, the network equipment
Including it is a kind of can be according to the instruction for being previously set or storing, the automatic electronic equipment for carrying out numerical computations and information processing,
Hardware includes but not limited to microprocessor, application-specific integrated circuit (ASIC), programmable logic device (PLD), field programmable gate
Array (FPGA), digital signal processor (DSP), embedded device etc..The network equipment includes but not limited to computer, net
The cloud that network host, single network server, multiple network server collection or multiple servers are formed;Here, cloud is by being based on cloud meter
The a large amount of computers or network server for calculating (Cloud Computing) are formed, wherein, cloud computing is the one of Distributed Calculation
Kind, a virtual supercomputer being made of the computer collection of a group loose couplings.The network includes but not limited to interconnect
Net, wide area network, Metropolitan Area Network (MAN), LAN, VPN network, wireless self-organization network (Ad Hoc networks) etc..Preferably, the equipment
Can also be run on the user equipment, the network equipment or user equipment and the network equipment, the network equipment, touch terminal or
The network equipment is integrated the program in formed equipment with touch terminal by network.
Certainly, those skilled in the art will be understood that above equipment is only for example, other are existing or are likely to occur from now on
Equipment be such as applicable to the application, should also be included within the application protection domain, and be incorporated herein by reference herein.
In the description of the present application, " multiple " are meant that two or more, unless otherwise specifically defined.
Fig. 1 shows a kind of method flow of digital finger-print for being used to generate writing of some embodiments of the application
Figure, the method comprising the steps of S11, step S12 and step S13.Wherein, in step s 11, equipment is carried from target text works
Take it is multiple make article unit, wherein, each make article unit include text information;In step s 12, equipment is from the multiple works list
It determines multiple preferably to make article unit in member;In step s 13, equipment is generated according to the text information preferably made in article unit
The digital finger-print of the target text works.Wherein, target text works will generate the digital finger-print of the writing.We
Case can perform completion by the network equipment, can also perform completion by user equipment.Here, for the sake of simplicity, the application will be with
Corresponding embodiment is illustrated exemplified by the network equipment;Those skilled in the art will be understood that in addition to the embodiment being explicitly illustrated individually, be somebody's turn to do
Etc. embodiments can equally be performed by user equipment.
Specifically, in step s 11, the network equipment extracted from target text works it is multiple make article unit, wherein, each
Making article unit includes text information.In some embodiments, extraction is made the mode of article unit and is included but not limited to:According to target text
The structure division of word works in itself, is obtained for example, division can be carried out according to the catalogue of target text works, each several part, each chapters and sections etc.
It must make article unit;Or divided according to the number of words of target text works, such as every a K word in target text works
It wins several words and makees article unit.For example, to target text works《Xx travel notes》" chapter 1 is obtained according to each chapter construction
Get ready for a journey ", " chapter 2 arrives safely ", multiple chapters and sections such as " chapter 3 tastes most cuisines ", therefrom " chapter 3 tastes most U.S. for extraction
The chapters and sections such as food ", " the 19th chapter is picturesque ", " the 20th chapter tourism gains in depth of comprehension " are to make article unit.In some embodiments, word
Information includes but not limited to the corresponding first letter of pinyin string of text strings, word or word string length (i.e. word quantity) etc..
In step s 12, the network equipment is made to determine multiple preferably to make article unit in article unit from the multiple.It is here, excellent
Being elected to be article unit includes obtaining and the target text works are more relevant makees article unit or more suitable from the screening of multiple works units
Make article unit together in generation digital finger-print.After upper example, " chapter 3 tastes most cuisines ", " the 19th chapter landscape show from extraction
It is beautiful ", " the 20th chapter travel gains in depth of comprehension " etc. make to screen in article unit, finally definite " the 19th chapter is picturesque ", " the 20th chapter trip
Trip gains in depth of comprehension " are preferably to make article unit.
In step s 13, the network equipment generates the target text according to the text information preferably made in article unit
The digital finger-print of works.In some embodiments, from preferably making to extract part or its whole in article unit, and by its corresponding text
Word information generates the digital finger-print of the target text works.Such as the example above, the text information for preferably making article unit is extracted:Word
String " carries the craftwork of national characters, beauty very much, to allow people too plenty for the eye to take it all in simply there are many it!And most attracting is this wind
The night scene in the graceful peaceful simple and unsophisticated small city of scape, allows people intoxicated simply!", the corresponding first letter of pinyin string of the text strings is
" TYXDDYMZTSDGYPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ ", word string length are " 53 ";Finally
The digital finger-print for generating the target text works is " TYXDDYMZTSDGYPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDY
JJZRRTZ53”。
In some embodiments, this method further includes step S14 (not shown), and in step S14, the network equipment is according to institute
The digital finger-print for stating target text works carries out matching inquiry in fingerprint database, makees condition with the target text to obtain
Matched writing;Alternatively, refer to by comparing the number of the digital finger-print and reference word works of the target text works
Line determines whether the target text works and the reference word works are same or similar.Here, fingerprint database stores
The digital finger-print of various writings, wherein, various writings include but not limited to classical writing, books and periodicals periodical is made
Product, scientific and technological writing or news media works;Reference word works include but not limited to classical writing, books and periodicals periodical is made
Product, scientific and technological writing or news media works.
In some embodiments, as above example, the network equipment is according to the digital finger-print " TYXDDYMZTSDG of target text works
YPMLJLJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ53 " carries out matching inquiry in fingerprint database, at some
In embodiment, it is identical with the digital finger-print of target text works such as to obtain another digital finger-print, and then obtains the digital finger-print
That is, there is the writing to match with target text works in corresponding writing.Further by verifying that two words are made
The authorization conditions of product judge whether one of them is piracy etc..
In some embodiments, as above example, the digital finger-print of target text works is " TYXDDYMZTSDGYPMLJLJZRR
MBXJEZXYRDSZFJYMYJGPXCDYJJZRRTZ53 ", the digital finger-print of reference word works is " TYXDDYMZTSDGYPMLJ
LJZRRMBXJEZXYRDSZFJYMYJGPXCDYJJZRTYBR54 " in some embodiments, refers to by comparing the two numbers
Line show that the similarity of character string of two digital finger-prints, such as similarity of character string 91% are more than preset number fingerprint character
String similarity 75%, therefore determine that two digital finger-print is similar.Further by verifying the authorization conditions of two writings, it is judged
One of whether be piracy etc..Wherein, preset number fingerprint similarity of character string 75% includes being set by the network equipment and generate.
This, those skilled in the art can determine the similar of two character strings by modes such as Jie Kade distances or smallest edit distances
Degree.
Fig. 2 is shown according to a kind of for generating the Part Methods of the digital finger-print of writing of the application some embodiments
Flow chart.Step S12 in this programme includes sub-step S121 and sub-step S122.In sub-step S121, the network equipment leads to
It crosses and matching inquiry is carried out in cell data storehouse according to the article unit of making, determine described to make article unit and the cell data storehouse
The maximum matching degree information of middle works;In sub-step S122, the network equipment is according to the maximum matching degree information from described more
It is a to make to determine multiple preferably to make article unit in article unit.For example, the network equipment by such article unit of making corresponding maximum by its
Ascending order arrangement is carried out with degree information, and top n is taken to make article unit conduct and preferably makees article unit;It is or maximum matching degree information is low
Make article unit as preferably making article unit in or equal to predetermined matching degree threshold value.In some embodiments, matching degree information
The ratio between the intersection for the text strings that can make article unit for two and the size of union.Here, cell data stock puts common word list
Member, including but not limited to classical phrase, sentence, paragraph or common saying, such as " Putting aside one matter, let's talk about the other ", " for follow-up detailed
Feelings are analysed and explained below " etc..
For example, to target text works《Xx travel notes》Extract " chapter 3 tastes most cuisines ", " the 19th chapter is picturesque ",
" the 20th chapter tourism gains in depth of comprehension " etc. respectively make article unit, carry out matching inquiry in cell data storehouse respectively, determine the works list
First maximum matching degree information with works in the cell data storehouse.For example, the text strings content to " chapter 3 tastes most cuisines "
" candied haws on a stick is glowing, does not know to attract the sight of how many people " carries out matching inquiry inquiry in cell data storehouse and obtains the work
The matching degree information of article unit and works a, b, c, d, e in the database is respectively 0.00,0.37,0.58,0.64,0.23, then
It is 0.64 that this, which makees article unit and the maximum matching degree information of works in cell data storehouse,.
To the text strings content of " the 19th chapter is picturesque " " spring rain and spring thunder, the continues patter of raindrops waft " in cell data
Matching inquiry is carried out in storehouse, obtain the matching degree information for making works f, g, h in article unit and the database be respectively 0.20,
0.49th, 0.58, then it is 0.58 that this, which makees article unit and the maximum matching degree information of works in cell data storehouse,.To " the 20th chapter trip
" this place, one absolutely for the text strings content of trip gains in depth of comprehension "!" matching inquiry is carried out in cell data storehouse, it obtains this and makees article unit with being somebody's turn to do
The matching degree information of works i is 0.5 in database, then the maximum matching degree information for making article unit and works in cell data storehouse
For 0.5.
In some embodiments, as above example makees the maximum matching degree information 0.64,0.58,0.5 of article unit according to these, with
Default maximum matching degree threshold value 0.6 is compared, and it is corresponding more than default maximum matching degree threshold value to reject maximum matching degree information
Make article unit, only retain and remaining make article unit and be determined as preferably making article unit.It is final to determine that maximum matching degree information is
" the 20th chapter tourism gains in depth of comprehension " corresponding works that 0.58 " the 19th chapter is picturesque " and maximum matching degree information are 0.5
Unit, which is used as, preferably makees article unit.
In some embodiments, this programme sub-step S121 includes:The network equipment as article unit by dividing described
Word processing determines described to make the corresponding unit description information of article unit;According to the unit description information in cell data storehouse into
Row matching inquiry determines that the maximum matching degree of the unit description information and unit description information in the cell data storehouse is believed
Breath, and as the maximum matching degree information for making article unit and works in the cell data storehouse.Here, unit description information
Include but not limited to the frequency for the keyword and its appearance for making article unit.In some embodiments, between unit description information
It is the ratio between size of the intersection of keyword text string and union in unit description information with degree information, wherein frequency of occurrence is more than 1
Keyword text string will repeat statistics calculate.Here, cell data stock puts common text unit and its unit description letter
Breath, wherein the common text unit stored includes but not limited to classical phrase, sentence, paragraph or common saying, such as " the flowers are in blossom two,
Each one, table ", " for follow-up details, being analysed and explained below " etc..
For example, it " looks forward to, looks forward to, spring is coming, and the doll that spring picture has just landed, spring is as sister-in-law to making article unit
Ma, spring is as vigorous youth." carry out word segmentation processing acquisition " look forward to that/wearing/is looked forward to// spring/and come// spring/as/
Just/fall/ground// doll/spring/as/small/Miss/spring/as/vibrant// young ", determine that this makees article unit correspondence
Unit description information:Keyword " looks forward to ", frequency of occurrence 2;Keyword " spring ", frequency of occurrence 4.Then, according to this
Unit description information carries out matching inquiry in cell data storehouse, determines in the unit description information and the cell data storehouse
The maximum matching degree information of unit description information.Assuming that the unit description information carries out matching inquiry inquiry in cell data storehouse
The matching degree information for obtaining unit description information a, b, c, d, e in the unit description information and the database is respectively 0.02,
0.33rd, 0.59,0.67,0.26, then the maximum matching degree of unit description information is believed in the unit description information and cell data storehouse
It ceases for 0.67.Then, which is made article unit and works in the cell data storehouse as described
Maximum matching degree information.
The calculating process of the unit description information and the matching degree information of unit description information in the cell data storehouse can
With reference to lower example:Assuming that there are works in cell data storehouse, " doll that spring picture has just landed is new from foot in front, it grows
It.Spring is gaudily dressed as little girl, laughs at, walks.The youth of spring picture stalwartness has cast-iron arm and waist foot, neck
Us to go forward.", the corresponding unit description information of the works is:Keyword " spring ", frequency of occurrence 4;Then Unit two
The matching degree information of description information is 0.67 (8/12=0.67).
In some embodiments, this programme step S12 includes:The network equipment is made to reject satisfaction in article unit from the multiple
Predetermined unit rule of elimination makees article unit;Make to determine multiple preferred works in article unit from the multiple residue for making article unit
Article unit.Wherein, unit rule of elimination includes but not limited to:
There is the unit to match in cell data storehouse in the article unit of making;It is retouched in the presence of with the unit for making article unit
The reference unit description information of information match is stated, wherein, the unit description information for making article unit is by the work
Article unit carries out what word segmentation processing determined;The word quantity for making article unit is less than unit word amount threshold.Wherein, unit
Description information includes but not limited to the frequency for the keyword and its appearance for making article unit.Reference unit description information, in some realities
It applies in example, reference unit description information is obtained by common text unit processing.In some embodiments, matching degree information is key
The ratio between the intersection of word text strings and the size of union, wherein, keyword text string of the frequency of occurrence more than 1 will repeat statistics meter
It calculates.
For example, make article unit " spring is the season being full of vitality ", " summer is a season for green ", " autumn is
The season of a harvest " and " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep ", in some embodiments, unit rule of elimination bag
It includes the article unit of making and there is the unit that matches in cell data storehouse, such as make article unit " this winter three layers of quilt of wheat lid, coming year pillow
Steamed bun to sleep " there is common text unit " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep " and the work in cell data storehouse
Article unit is identical, and matching degree information is that 1 and Unit two match;Making to reject this in article unit to make article unit " modern from upper example
Three layers of quilt of winter wheat lid, the coming year rest the head on steamed bun and sleep ", and make article unit " spring is the season being full of vitality ", " summer from remaining
It is a season for green " with determining multiple preferably to make article unit in " autumn is a season for harvest ".In some embodiments,
Unit rule of elimination includes the reference unit description information for having with the unit description information relevant matches for making article unit,
In, the unit description information for making article unit is by determining the article unit progress word segmentation processing of making.Such as works list
A member " autumn is a season for harvest ", to this make article unit carry out word segmentation processing for " autumn/be// harvest// season " really
Determine the unit description information for making article unit:Keyword " harvest ", frequency of occurrence 1;The reference for obtaining common text unit is single
First description information:Keyword " harvest ", frequency of occurrence 1 have what is matched with the unit description information for making article unit
Reference unit description information then weeds out this and makees article unit, and making article unit from remaining 3, " spring is a is full of vitality
Season ", " summer is a season for green " in " this winter three layers of quilt of wheat lid, the coming year rest the head on steamed bun and sleep " with determining multiple preferred works
Unit.In some embodiments, unit rule of elimination includes the word quantity for making article unit and is less than unit word quantity threshold
Value.For example, unit word amount threshold is 2, it is " good to make article unit!" word quantity for 1 less than unit word amount threshold 2,
Generally can not reflect the feature of the works well, then weed out this and make article unit, from it is remaining make in article unit determine it is multiple
It is preferred that make article unit.Wherein, unit word amount threshold 2 includes calculating generation by counting by the network equipment.
Those skilled in the art will be understood that the content of said units rule of elimination is only for example, existing or from now on
The other unit rule of eliminations being likely to occur, such as the combination of Yi Shang unit rule of elimination, being such as applicable to the application should all wrap
It is contained in the protection domain of the application, and is incorporated herein by reference herein.
In some embodiments, in step S13, the network equipment extracts institute from the text information preferably made in article unit
It states and preferably makees the corresponding unit character features information of article unit;The preferred works are generated according to the unit character features information
The unit fingerprint of unit;The digital finger-print of the target text works is generated according to the unit fingerprint.Here, unit word is special
Reference breath includes but not limited to corresponding text strings, the corresponding first letter of pinyin string of word or the word quantity information made in article unit.
For example, it is preferable to make article unit text strings " at noon, The hot sun is high in the sky ", the corresponding first letter of pinyin string of word is
" ZWLRDK ", word quantity are 6;It is preferred that making article unit text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string of word is
" BWXYXX ", word quantity are 6.In some embodiments, part is extracted from the text information for preferably making article unit, as excellent
It is elected to be the corresponding unit character features information of article unit.For example, make article unit text strings " at noon, The hot sun is high in the sky ", text from preferred
The corresponding first letter of pinyin string " ZWLRDK " of word, word quantity are 6, extract text strings " The hot sun is high in the sky ", the corresponding phonetic lead-in of word
Letter string " LRDK ", word quantity are 4, preferably make the corresponding unit character features information of article unit as this;From preferred works list
First text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string " BWXYXX " of word, word quantity are 6, extract text strings
The corresponding first letter of pinyin string " BW " of " dusk ", word, word quantity are 2, preferably make the unit character features of article unit as this
Information.According to said units character features information:The corresponding first letter of pinyin string " LRDK " of text strings " The hot sun is high in the sky ", word, text
Number of words is 4 first letter of pinyin strings " BW " corresponding with text strings " dusk " word, word quantity is 2, and generation is corresponding preferred respectively
Make the unit fingerprint of article unit:“LRDK4”、“BW2”.The number that target text works are finally generated according to the fingerprint of Unit 2 refers to
Line, in some embodiments, the digital finger-print of unit fingerprint generation target text works include carrying out the corresponding cumulative, letter of number
String combines successively, generates the digital finger-print of target text works.As unit fingerprint " LRDK4 " and " BW2 " generation target text are made
The digital finger-print of product is " LRDKBW6 ".In further embodiments, the digital finger-print bag of unit fingerprint generation target text works
Include the digital finger-print of alphabetic string and digital combination producing target text works successively.For example, unit fingerprint " LRDK4 " and " PW2 "
The digital finger-print for generating target text works is " LRDK4BW2 ".
In some embodiments, the unit character features information includes one or more continuous text strings, each continuous
The length of text strings is equal to or more than predetermined text strings length threshold.In some embodiments, the text strings length threshold
Including but not limited to:The works mean sentence length information of the target text works;The author of the target text works is corresponding
Author's mean sentence length information.In some embodiments, it is discontinuous between multiple continuous text strings, by other words or mark space
It opens to distinguish other continuous text strings.Wherein, the quantity of length, that is, word of text strings, sentence long message are every word
Word quantity.Works mean sentence length information in some embodiments, is gone forward side by side by the sentence long message for counting each sentence word of the works
Row summation, then distich long message calculate average value to obtain works mean sentence length information.Author's mean sentence length information, in some realities
It applies in example, the works mean sentence length information of author's oeuvre is calculated by counting and sums, then be averaged to works
Sentence long message calculates average value to obtain author's mean sentence length information.
For example, it is preferable to make the corresponding unit character features information of article unit including " sky in autumn is most beautiful, sky color
As sea is so azure, the sun, which rises, to be come ", " autumn dark blue so beautiful, cloudless, the almost withered leaf in day, I gets drunk
In this autumn scenery " with " at the night in autumn, the moon hangs over day to seem especially bright in the air, and mountain range, which is all seen, to be perfectly clear " three companies
Continuous text strings, each text strings are continuous, discontinuous between continuous text strings two-by-two.Three words of the unit character features information
Corresponding word string length of going here and there be respectively word quantity be 23, word quantity is 29, word quantity is 27;By with predetermined text
Word length threshold value 12 compares, and the length for obtaining three text strings is all higher than predetermined text strings length threshold 12.Wherein, really
Fixed predetermined text strings length threshold 12 includes:It is long by the sentence for counting each sentence word of the target text works in some embodiments
Information, such as " 3,12,15,14,11,6,13,14,10,2 " simultaneously carry out summation as 100, and it is 10 that distich long message, which calculates average value,
The works mean sentence length information for obtaining the target text works is 10, further according to most sentence long letters in the target text works
Breath is more than 10, and the length threshold for presetting text strings is 1.2 times of the works mean sentence length information of target text works, then finally really
Text strings length threshold is determined for 12.Or in some embodiments, by counting the corresponding authors of author of the target text works
Mean sentence length information determines text strings length threshold.For example, the author of the target text works is Lee three, to each portion of Lee three
Writing counts respectively, obtains the works mean sentence length information of every writing, such as obtains the work of " writing "
Product mean sentence length information is 14, and the works mean sentence length information of " two writings " is 9, and the works of " three writings " are put down
Equal sentence long message is 13;Summation is carried out to the sentence long message of three writings of three author of Lee as 36, further to this three
It is 12 that the works mean sentence length information of portion's works, which calculates average value, and it is long finally to preset text strings according to the characteristic style of works of the author
It is 12 to spend threshold value.
Those skilled in the art will be understood that the mode of above-mentioned definite text strings length threshold is only for example, it is existing or
The other manner for the definite text strings length threshold that person will be likely to occur from now on, the application should be all included in by being such as applicable to the application
Protection domain in, and be incorporated herein by reference herein.
In some embodiments, this programme step S13 includes:The network equipment preferably makees text in article unit from the multiple
The works character features information of the target text works is extracted in word information;Institute is generated according to the works character features information
State the digital finger-print of target text works.
For example, it is preferable to make article unit text strings " at noon, The hot sun is high in the sky ", the corresponding first letter of pinyin string of word is
" ZWLRDK ", word quantity are 6;It is preferred that making article unit text strings " at dusk, the sundowners ", the corresponding first letter of pinyin string of word is
" BWXYXX ", word quantity are 6.The text information for preferably making article unit from above-mentioned two extracts part, and " burning sun is worked as acquisition respectively
Sky " text strings, the corresponding first letter of pinyin string of word are " LRDK ", word quantity is 4 corresponding with " the sundowners " text strings, word
First letter of pinyin string is " XYXX ", word quantity is 4.In some embodiments, the word in the text information that will extract is passed through
String, the corresponding first letter of pinyin string of word combine successively, and word quantity is added up to obtain the works word of the target text works
Characteristic information, for example, finally obtain the works character features information of the target text works, " The hot sun is high in the sky the sundowners " word
String, the corresponding first letter of pinyin string of word are " LRDK XYXX ", word quantity is 8;According to the character features of the target text works
The digital finger-print that information generates the target text works is " LRDK XYXX8 ".
Those skilled in the art will be understood that the mode of above-mentioned generation target text works digital finger-print is only for example, existing
The other manner of generation target text works digital finger-print that is having or being likely to occur from now on is such as applicable to the application and all should
In the protection domain of the application, and it is incorporated herein by reference herein.
Present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has calculating
Machine code, when the computer code is performed, such as preceding any one of them method is performed.
Present invention also provides a kind of computer program product, when the computer program product is performed by computer equipment
When, such as preceding any one of them method is performed.
Present invention also provides a kind of computer equipment, the computer equipment includes:
One or more processors;
Memory, for storing one or more computer programs;
When one or more of computer programs are performed by one or more of processors so that it is one or
Multiple processors realize such as preceding any one of them method.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With application-specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment
In, the software program of the application can perform to realize steps described above or function by processor.Similarly, the application
Software program can be stored in computer readable recording medium storing program for performing (including relevant data structure), for example, RAM memory,
Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example
Such as, as the circuit for coordinating to perform each step or function with processor.
In addition, the part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer performs, by the operation of the computer, it can call or provide according to the present processes and/or technical solution.
Those skilled in the art will be understood that existence form of the computer program instructions in computer-readable medium includes but not limited to
Source file, executable file, installation package file etc., correspondingly, the mode that computer program instructions are computer-executed include but
It is not limited to:The computer directly perform the instruction or the computer compile the instruction after perform program after corresponding compiling again,
Either the computer reads and performs the instruction or after the computer reads and install and perform corresponding installation again after the instruction
Program.Here, computer-readable medium can be for computer access arbitrary available computer readable storage medium or
Communication media.
Communication media includes thereby including such as computer-readable instruction, data structure, program module or other data
Signal of communication is transmitted to the medium of another system from a system.Communication media may include there is transmission medium (such as electricity led
Cable and line (for example, optical fiber, coaxial etc.)) and can propagate wireless (not having the transmission the led) medium of energy wave, such as sound, electricity
Magnetic, RF, microwave and infrared.Computer-readable instruction, data structure, program module or other data can be embodied as example wireless
Medium (such as carrier wave or be such as embodied as spread spectrum technique a part similar mechanism) in modulated message signal.
Term " modulated message signal " refers to that one or more feature is modified or is set in a manner of coding information in the signal
Fixed signal.Modulation can be simulation, digital or Hybrid Modulation Technology.
As an example, not a limit, computer readable storage medium may include to store such as computer-readable finger
Make, the volatile and non-volatile that any method or technique of the information of data structure, program module or other data is realized, can
Mobile and immovable medium.For example, computer readable storage medium includes, but not limited to volatile memory, such as with
Machine memory (RAM, DRAM, SRAM);And nonvolatile memory, such as flash memory, various read-only memory (ROM, PROM,
EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, FeRAM);And magnetic and optical storage apparatus (hard disk,
Tape, CD, DVD);Or other currently known media or Future Development can store the computer used for computer system
Readable information/data.
Here, including a device according to one embodiment of the application, which includes storing computer program
The memory of instruction and the processor for executing program instructions, wherein, when the computer program instructions are performed by the processor
When, trigger methods and/or techniques scheme of the device operation based on foregoing multiple embodiments according to the application.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie
In the case of without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power
Profit requirement rather than above description limit, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included in the application.Any reference numeral in claim should not be considered as to the involved claim of limitation.This
Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade words are used for table
Show title, and do not represent any particular order.
Claims (12)
1. it is a kind of for generating the method for the digital finger-print of writing, wherein, this method includes:
Extracted from target text works it is multiple make article unit, wherein, each make article unit include text information;
Make to determine multiple preferably to make article unit in article unit from the multiple;
The digital finger-print of the target text works is generated according to the text information preferably made in article unit.
2. according to the method described in claim 1, wherein, the method further includes:
Matching inquiry is carried out in fingerprint database according to the digital finger-print of the target text works, to obtain and the target
The writing that writing matches;Alternatively,
By comparing the digital finger-print of the digital finger-print and reference word works of the target text works, the target text is determined
Whether word works and the reference word works are same or similar.
It is 3. described to make to determine multiple preferred works lists in article unit from the multiple according to the method described in claim 1, wherein
Member includes:
By carrying out matching inquiry in cell data storehouse according to the article unit of making, determine described to make article unit and the unit
The maximum matching degree information of works in database;
Make to determine in article unit multiple preferably to make article unit from the multiple according to the maximum matching degree information.
It is described by making article unit according to described in cell data storehouse and carrying out 4. according to the method described in claim 3, wherein
Matching inquiry determines that the maximum matching degree information for making article unit and works in the cell data storehouse includes:
By making article unit and carrying out word segmentation processing to determine described to make the corresponding unit description information of article unit to described;
Matching inquiry is carried out in cell data storehouse according to the unit description information, determine the unit description information with it is described
The maximum matching degree information of unit description information in cell data storehouse, and make as described in article unit and the cell data storehouse
The maximum matching degree information of works.
It is 5. described to make to determine multiple preferred works lists in article unit from the multiple according to the method described in claim 1, wherein
Member includes:
From it is the multiple make to reject in article unit meet predetermined unit rule of elimination make article unit;
Make to determine multiple preferably to make article unit in article unit from the multiple residue for making article unit.
6. according to the method described in claim 5, wherein, the unit rule of elimination includes following at least any one:
There is the unit to match in cell data storehouse in the article unit of making;
In the presence of the reference unit description information to match with the unit description information for making article unit, wherein, the works list
The unit description information of member is by determining the article unit progress word segmentation processing of making;
The word quantity for making article unit is less than unit word amount threshold.
It is 7. described that institute is generated according to the text information preferably made in article unit according to the method described in claim 1, wherein
Stating the digital finger-print of target text works includes:
Preferably make the corresponding unit character features information of article unit from the text information extraction preferably made in article unit is described;
According to the unit character features information generation unit fingerprint for preferably making article unit;
The digital finger-print of the target text works is generated according to the unit fingerprint.
8. according to the method described in claim 7, wherein, the unit character features information includes one or more continuous words
String, the length of each continuous text strings are equal to or more than predetermined text strings length threshold.
9. according to the method described in claim 8, wherein, the text strings length threshold is determined based on any one of following:
The works mean sentence length information of the target text works;
The corresponding author's mean sentence length information of author of the target text works.
It is 10. described to be generated according to the text information preferably made in article unit according to the method described in claim 1, wherein
The digital finger-print of the target text works includes:
The works character features letter of the target text works is extracted from the multiple text information preferably made in article unit
Breath;
The digital finger-print of the target text works is generated according to the works character features information.
11. it is a kind of for generating the equipment of the digital finger-print of writing, wherein, which includes:
Processor;And
The memory of storage computer executable instructions is arranged to, the executable instruction makes the processor when executed
Perform the operation as any one of claims 1 to 10.
12. a kind of computer-readable medium including instructing, described instruction cause system to carry out such as claim 1 when executed
To the operation any one of 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711329111.7A CN108108596B (en) | 2017-12-13 | 2017-12-13 | Method and equipment for generating digital fingerprints of written works |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711329111.7A CN108108596B (en) | 2017-12-13 | 2017-12-13 | Method and equipment for generating digital fingerprints of written works |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108108596A true CN108108596A (en) | 2018-06-01 |
CN108108596B CN108108596B (en) | 2020-12-01 |
Family
ID=62215797
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711329111.7A Active CN108108596B (en) | 2017-12-13 | 2017-12-13 | Method and equipment for generating digital fingerprints of written works |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108108596B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664900A (en) * | 2018-04-20 | 2018-10-16 | 上海掌门科技有限公司 | A kind of method and apparatus of the similarities and differences of writing for identification |
CN109345416A (en) * | 2018-09-12 | 2019-02-15 | 连尚(新昌)网络科技有限公司 | It is a kind of for recording the method and apparatus of the adduction relationship between works |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315622A (en) * | 2007-05-30 | 2008-12-03 | 香港中文大学 | System and method for detecting file similarity |
CN101976318A (en) * | 2010-11-15 | 2011-02-16 | 北京理工大学 | Detection method of code similarity based on digital fingerprints |
CN102542183A (en) * | 2010-12-17 | 2012-07-04 | 盛乐信息技术(上海)有限公司 | Method and system for detecting copyright of network literature |
CN102855424A (en) * | 2011-06-29 | 2013-01-02 | 盛乐信息技术(上海)有限公司 | Digital fingerprint extraction method and device and literary works identification method and device |
CN102855423A (en) * | 2011-06-29 | 2013-01-02 | 盛乐信息技术(上海)有限公司 | Tracking method and device of literary works |
US8799236B1 (en) * | 2012-06-15 | 2014-08-05 | Amazon Technologies, Inc. | Detecting duplicated content among digital items |
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
-
2017
- 2017-12-13 CN CN201711329111.7A patent/CN108108596B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315622A (en) * | 2007-05-30 | 2008-12-03 | 香港中文大学 | System and method for detecting file similarity |
CN101976318A (en) * | 2010-11-15 | 2011-02-16 | 北京理工大学 | Detection method of code similarity based on digital fingerprints |
CN102542183A (en) * | 2010-12-17 | 2012-07-04 | 盛乐信息技术(上海)有限公司 | Method and system for detecting copyright of network literature |
CN102855424A (en) * | 2011-06-29 | 2013-01-02 | 盛乐信息技术(上海)有限公司 | Digital fingerprint extraction method and device and literary works identification method and device |
CN102855423A (en) * | 2011-06-29 | 2013-01-02 | 盛乐信息技术(上海)有限公司 | Tracking method and device of literary works |
US8799236B1 (en) * | 2012-06-15 | 2014-08-05 | Amazon Technologies, Inc. | Detecting duplicated content among digital items |
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
Non-Patent Citations (3)
Title |
---|
史彦军等: "抄袭论文识别研究与进展", 《大连理工大学学报》 * |
类艳春: "基于篇章结构的抄袭论文识别系统的研究与实现", 《CNK中国优秀硕士学位论文全文数据库信息科技辑》 * |
董卫博: "中文文档复制检测系统的研究与实现", 《CNK中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664900A (en) * | 2018-04-20 | 2018-10-16 | 上海掌门科技有限公司 | A kind of method and apparatus of the similarities and differences of writing for identification |
CN109345416A (en) * | 2018-09-12 | 2019-02-15 | 连尚(新昌)网络科技有限公司 | It is a kind of for recording the method and apparatus of the adduction relationship between works |
CN109345416B (en) * | 2018-09-12 | 2021-09-21 | 连尚(新昌)网络科技有限公司 | Method and equipment for recording reference relation between works |
Also Published As
Publication number | Publication date |
---|---|
CN108108596B (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gaveau et al. | Slowing deforestation in Indonesia follows declining oil palm expansion and lower oil prices | |
KR101855597B1 (en) | Systems and methods for video paragraph captioning using hierarchical recurrent neural networks | |
CN107707545A (en) | A kind of abnormal web page access fragment detection method, device, equipment and storage medium | |
CN105488092A (en) | Time-sensitive self-adaptive on-line subtopic detecting method and system | |
CN104281565B (en) | Semantic dictionary construction method and device | |
CN106528846A (en) | Retrieval method and device | |
CN110287309A (en) | The method of rapidly extracting text snippet | |
CN110032859A (en) | Abnormal account's discrimination method and device and medium | |
CN102999638A (en) | Phishing website detection method excavated based on network group | |
CN106030527B (en) | By the system and method for application notification user available for download | |
He et al. | Petgen: Personalized text generation attack on deep sequence embedding-based classification models | |
CN108108596A (en) | A kind of method and apparatus for the digital finger-print for being used to generate writing | |
Barlow et al. | A novel approach to detect phishing attacks using binary visualisation and machine learning | |
CN103678480A (en) | Personalized image retrieval method with privacy controlled in grading mode | |
CN107169011A (en) | The original recognition methods of webpage based on artificial intelligence, device and storage medium | |
CN104462282B (en) | Information search method and device | |
Yingjie et al. | A zero-watermarking scheme for prose writings | |
Zhu et al. | A4: Evading learning-based adblockers | |
CN104008333B (en) | The detection method and equipment of a kind of installation kit | |
CN104008334B (en) | The clustering method and equipment of a kind of file | |
CN116467710A (en) | Unbalanced network-oriented malicious software detection method | |
CN107704732A (en) | A kind of method and apparatus for being used to generate works fingerprint | |
WO2019019711A1 (en) | Method and apparatus for publishing behaviour pattern data, terminal device and medium | |
Yuan et al. | Utilizing related samples to learn complex queries in interactive concept-based video search | |
CN108664900A (en) | A kind of method and apparatus of the similarities and differences of writing for identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |