CN107832303A - The recognition methods of ancient books title and device - Google Patents

The recognition methods of ancient books title and device Download PDF

Info

Publication number
CN107832303A
CN107832303A CN201711177794.9A CN201711177794A CN107832303A CN 107832303 A CN107832303 A CN 107832303A CN 201711177794 A CN201711177794 A CN 201711177794A CN 107832303 A CN107832303 A CN 107832303A
Authority
CN
China
Prior art keywords
title
ancient books
marked
saved
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711177794.9A
Other languages
Chinese (zh)
Inventor
洪涛
干生洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gu Lian (beijing) Digital Media Technology Co Ltd
Original Assignee
Gu Lian (beijing) Digital Media Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gu Lian (beijing) Digital Media Technology Co Ltd filed Critical Gu Lian (beijing) Digital Media Technology Co Ltd
Priority to CN201711177794.9A priority Critical patent/CN107832303A/en
Publication of CN107832303A publication Critical patent/CN107832303A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a kind of ancient books title recognition methods and device, by obtaining ancient books text, carries out word segmentation processing to the ancient books text based on ancient books statistical model segmentation methods, obtains multiple participles;Based on the title knowledge base pre-saved, classify to each, wherein, class categories include:Title, piece name, dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name;For each participle, when judging a classification that the participle is pertaining only in class categories, the participle is saved as to the target title of the ancient books text.The problem of this method can alleviate traditional manual title mark and take time and effort, and traditional machine mark effect is poor.The precision and quality of ancient books title identification are improved, so as to meet the practical application request in editing and publishing of ancient book work.

Description

The recognition methods of ancient books title and device
Technical field
The present invention relates to data processing field, in particular to a kind of ancient books title recognition methods and device.
Background technology
Punctuation marks used to enclose the title are used in inside ancient books or some literature and history works, for representing the mark side of title, piece name, dance accompanied by music name etc. Method, reader is contributed to preferably to read and understand literature of ancient book.In traditional editing and publishing of ancient writings, punctuation marks used to enclose the title punctuate relies primarily on Manually complete, thus take time and effort with experience in the knowledge of experts and scholars, make slow progress.In recent years, with digitlization and information The development of technology, domestic scholars start to carry out the exploration work of machine punctuate using computer, however, due to ancient books writing in classical Chinese language The complexity of method so that the effect of traditional machine punctuate is unsatisfactory, also without the mark for ancient books punctuation marks used to enclose the title of shaping Method.
The content of the invention
In view of this, the purpose of the embodiment of the present invention is to provide a kind of ancient books title recognition methods and device, with slow Solve traditional manual title mark to take time and effort, traditional machine marks the problem of effect is poor.
In a first aspect, the embodiments of the invention provide a kind of ancient books title recognition methods, methods described includes:Obtain ancient books Text, word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtains multiple participles;Based on advance The title knowledge base of preservation, classify to each, wherein, class categories include:Title, piece name, dance accompanied by music name, title letter Title, piece name abbreviation and title containing name;For each participle, in the classification for judging the participle and being pertaining only in class categories When, the participle is saved as to the target title of the ancient books text.
Second aspect, the embodiments of the invention provide a kind of ancient books title identification device, described device includes:Obtain mould Block, for obtaining ancient books text, word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtained more Individual participle;Sort module, for based on the title knowledge base pre-saved, classifying to each, wherein, class categories Including:Title, piece name, dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name;Execution module, for each participle, When judging a classification that the participle is pertaining only in class categories, the participle is saved as to the target book of the ancient books text Name.
Compared with prior art, the recognition methods of ancient books title and device that various embodiments of the present invention propose, pass through acquisition Ancient books text, word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtains multiple participles;It is based on The title knowledge base pre-saved, classify to each, wherein, class categories include:Title, piece name, dance accompanied by music name, book Name abbreviation, piece name abbreviation and title containing name;For each participle, in one for judging the participle and being pertaining only in class categories During classification, the participle is saved as to the target title of the ancient books text.This method can alleviate traditional manual title mark Take time and effort, traditional machine marks the problem of effect is poor.The precision and quality of ancient books title identification are improved, so as to meet Gu Practical application request in the work of nationality collation and publishing.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is the structured flowchart of electronic equipment provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart for ancient books title recognition methods that first embodiment of the invention provides;
Fig. 3 is the flow chart for another ancient books title recognition methods that first embodiment of the invention provides;
Fig. 4 is the flow chart for another ancient books title recognition methods that first embodiment of the invention provides;
Fig. 5 is a kind of structured flowchart for ancient books title identification device that second embodiment of the invention provides;
Fig. 6 is the structured flowchart for another ancient books title identification device that second embodiment of the invention provides;
Fig. 7 is the structured flowchart for another ancient books title identification device that second embodiment of the invention provides.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
As shown in figure 1, being the block diagram of the electronic equipment 100, the electronic equipment 100 can be PC (personal computer, PC), tablet personal computer, personal digital assistant (personal digital assistant, PDA) Deng.Wherein, the electronic equipment 100 includes:Ancient books title identification device, memory 110, storage control 120, processor 130th, Peripheral Interface 140, input-output unit 150, audio unit 160, display unit 170.
The memory 110, storage control 120, processor 130, Peripheral Interface 140, input-output unit 150, sound Frequency unit 160 and 170 each element of display unit are directly or indirectly electrically connected between each other, with realize the transmission of data or Interaction.It is electrically connected with for example, these elements can be realized by one or more communication bus or signal wire between each other.The Gu Nationality title identification device can be stored in the memory 110 including at least one in the form of software or firmware (firmware) Or it is solidificated in the software function module in the operating system (operating system, OS) of client device.The processor 130 are used to perform the executable module stored in memory 110, such as the software function that the ancient books title identification device includes Module or computer program.
Wherein, memory 110 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memory 110 is used for storage program, and the processor 130 performs described program after execute instruction is received, foregoing Method performed by the electronic equipment 100 for the flow definition that any embodiment of the embodiment of the present invention discloses can apply to processor In 130, or realized by processor 130.
Processor 130 is probably a kind of IC chip, has the disposal ability of signal.Above-mentioned processor 130 can To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), application specific integrated circuit (ASIC), Ready-made programmable gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hard Part component.It can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor Can be microprocessor or the processor can also be any conventional processor etc..
Various input/output devices are coupled to processor 130 and memory 110 by the Peripheral Interface 140.At some In embodiment, Peripheral Interface 140, processor 130 and storage control 120 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
Input-output unit 150 is used to be supplied to user input data to realize interacting for user and electronic equipment 100.It is described Input-output unit 150 may be, but not limited to, mouse and keyboard etc..
Audio unit 160 provides a user COBBAIF, and it may include one or more microphones, one or more raises Sound device and voicefrequency circuit.
Display unit 170 provides an interactive interface (such as user interface) between electronic equipment 100 and user Or referred to for display image data to user.In the present embodiment, the display unit 170 can be liquid crystal display or touch Control display.If touch control display, it can be that the capacitance type touch control screen or resistance-type for supporting single-point and multi-point touch operation touch Control screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one or more Individual opening position is with caused touch control operation, and the touch control operation that this is sensed transfers to processor 130 to be calculated and handled.
First embodiment
It refer to Fig. 2, Fig. 2 is a kind of flow chart for ancient books title recognition methods that first embodiment of the invention provides, institute State method and be applied to electronic equipment.The flow shown in Fig. 2 will be described in detail below, methods described includes:
Step S110:Ancient books text is obtained, the ancient books text is segmented based on ancient books statistical model segmentation methods Processing, obtains multiple titles.
Electronic equipment can obtain the ancient books text for needing to carry out title identification of user's input, obtain ancient books textual data According to.After obtaining ancient books text data, electronic equipment can be based on the ancient books statistical model segmentation methods pre-saved by the Gu Nationality text data is split, and obtains multiple participles by word segmentation processing.
Wherein, the ancient books statistical model segmentation methods pre-saved, can be on the basis of existing segmentation methods On, it with the addition of a kind of algorithm applied to ancient books obtained from some conditions of ancient books.Wherein, the condition can be that input is ancient Nationality language material and train obtained ancient books statistical model, the ancient books statistical model segmentation methods can segment ancient books text to be multiple Participle.
Step S120:Based on the title knowledge base pre-saved, classify to each, wherein, class categories bag Include:Title, piece name, dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name.
Electronic equipment can obtain what is formed after manual sorting and check and correction, mark punctuation marks used to enclose the title and title Ancient books, and extract text feature and title.These text features and title can be carried out classification processing by staff, Wherein, classification can have:Title is (such as《The Records of the Historian》), piece name (such as《Fan Shengchuan》:《Later Han Dynasty's book model, which rises, to be passed》), dance accompanied by music name (such as《Extensively Mound dissipates》), author+title is referred to as (such as class《Book》:Class is solid to be write《Chinese book》), title+piece name referred to as (such as《Sui Zhi》:《Sui Shujing Nationality will), title containing name (such as《Soviet Union's horizontal bar in the front of a carriage used as an armrest collected works》).
Electronic equipment can obtain the data that staff obtained after classification processing in advance, know so as to save as title Know storehouse.Further, electronic equipment can also receive the data of staff's input, and title knowledge base is arranged, and establish Title aids in dictionary, the title of auxiliary word including author, title, posthumous title of an emperor, prefecture prestige, seniority among brothers and sisters, appellation, it is attached sew, the title of an emperor's reign, government office, army Mansion, tribe, title of a reigning dynasty etc., and the interference between title auxiliary word and the qi that disappears are arranged, obtain each book in title knowledge base The abbreviation of name, the dependency relation of the abbreviation of piece name and source etc., and update title knowledge base.
On the premise of title knowledge base has been pre-saved, electronic equipment step S110 can be obtained it is multiple segment into Row classification.For example,《The Records of the Historian》Belong to title,《Fan Shengchuan》Belong to piece name,《Guanling dissipates》Belong to dance accompanied by music name, class《Book》Belong to author + title abbreviation,《Sui Zhi》Belong to title+piece name referred to as,《Soviet Union's horizontal bar in the front of a carriage used as an armrest collected works》Belong to title containing name.
Step S130:For each participle, in a classification and the category for judging that the participle is pertaining only in class categories When not being piece name, the participle is saved as to the target title of the ancient books text.
If the participle is pertaining only to a classification in class categories, illustrate the participle by accurate definition to be that everybody is approved Title.
As a kind of embodiment, Fig. 3 is refer to, methods described can also include:
Step S140:For each participle, when judging that the class categories belonging to the participle are more than a classification, by this point Word saves as participle to be marked.
For example, a certain participle, had both belonged to title class, belong to piece name class again, then the class categories belonging to the participle are more than One, then the participle is saved as title to be marked.
Step S150:To remaining each title to be marked, more accurate identification is carried out.
Wherein, more accurate identification method, can include:
A:For each title to be marked, based on the title knowledge base pre-saved, judging that the title to be marked is one The abbreviation of individual target title, and when belonging to title+piece name abbreviation structure, the title to be marked is saved as into target title, and more Fresh target title storehouse.
B:For remaining each title to be marked, the paragraph that the title to be marked occurs in the ancient books text is preserved Position;Based on the title knowledge base pre-saved, judging that the title to be marked is the abbreviation of a target title, and When belonging to author+title abbreviation, judge to preset whether line number characteristic character occurs before and after the paragraph position of the title to be marked, During to be, the title to be marked is saved as into target title, and more fresh target title storehouse.Wherein, default line number can be 5 rows, Certainly, default line number can be adjusted according to Records of the Historian situation.The characteristic character can include:" saying ", " cloud ", " meaning " or “:" etc. expression explain character.
C:For remaining each title to be marked, based on the title knowledge base pre-saved, the title to be marked is judged Whether the content with presetting line number before and after the paragraph position of the title to be marked forms name+title structure;When to be, by this Title to be marked saves as target title, and more fresh target title storehouse.
D:For remaining each title to be marked, screening includes the title to be marked of variant Chinese character or the complex form of Chinese characters;Will The title to be marked for including variant Chinese character or the complex form of Chinese characters is matched with the title knowledge base pre-saved;If the match is successful, The title to be marked is saved as into target title, and more fresh target title storehouse.
E:Based on the title knowledge base pre-saved, judging that the title to be marked is the letter of a target title When claiming, and belonging to title+piece name abbreviation structure, the title to be marked is saved as into target title, and more fresh target title storehouse.
Wherein, the how individual schemes of the ABCD can perform in sequence, can also be executed sequentially, such as first carry out B CDA is performed again, it is also an option that a part of scheme in ABCD is performed.
What deserves to be explained is to the E that carries into execution a plan, then before the E that carries into execution a plan, at least one in ABCD can be typically performed It is individual.
As a kind of embodiment, Fig. 4 is refer to, methods described can also include:
Step S160:Obtained each target title is marked out into punctuation marks used to enclose the title.
Wherein, the punctuation marks used to enclose the title for "《》”.Such as electronic equipment identifies that " Records of the Historian " is target title, then in ancient books text In, will appear from the position of " Records of the Historian " plus "《》", such as "《The Records of the Historian》”.
Certainly, punctuation marks used to enclose the title be not necessarily all "《》" or other characters, such as underscore, downslide wave, downslide Double horizontal lines etc., are no longer repeated one by one herein.
Certainly, it is worthy of note that, electronic equipment can also regularly update title knowledge base, to be carried for the identification of title For more accurate reference.
The ancient books title recognition methods that first embodiment of the invention provides has the beneficial effect that:By obtaining ancient books text, Word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtains multiple participles;Based on pre-saving Title knowledge base, classify to each, wherein, class categories include:Title, piece name, dance accompanied by music name, title referred to as, Piece name abbreviation and title containing name;For each participle, when judging a classification that the participle is pertaining only in class categories, The participle is saved as to the target title of the ancient books text.This method can alleviate traditional time-consuming consumption of manual title mark Power, traditional machine mark the problem of effect is poor.Further, the present invention makes full use of ancient books to be described for title various Mode, there is provided more practical and more accurately ancient books title recognition methods, provided for the automatic title identification of Study of Ancient Books and mark Strong help.In addition, the present invention is based on the consideration for publishing application, the advantages of inheriting statistical model and title knowledge base, And it with the addition of can constantly provide the purpose accurately marked with self-teaching and the framework of supplement so as to realize on this basis.
Second embodiment
Fig. 5 is refer to, Fig. 5 is a kind of structural frames for ancient books title identification device 400 that second embodiment of the invention provides Figure.The structured flowchart shown in Fig. 5 will be illustrated below, shown device includes:
Acquisition module 410, for obtaining ancient books text, the ancient books text is entered based on ancient books statistical model segmentation methods Row word segmentation processing, obtain multiple participles;
Sort module 420, for based on the title knowledge base pre-saved, classifying to each, wherein, classification Classification includes:Title, piece name, dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name;
Execution module 430, for for each participle, judge a classification that the participle is pertaining only in class categories and When the category is not piece name, the participle is saved as to the target title of the ancient books text.
As a kind of embodiment, Fig. 6 is refer to, described device can also include:
Preserving module 440, for for each participle, judging that the class categories belonging to the participle are more than a classification When, the participle is saved as into title to be marked.
Identification module 450, for remaining each title to be marked, carrying out more accurate identification.
As a kind of embodiment, Fig. 7 is refer to, described device can also include:
Labeling module 460, for obtained each target title to be marked out into punctuation marks used to enclose the title, wherein, the punctuation marks used to enclose the title are “《》”。
The process of the respective function of each Implement of Function Module for the device 400 that the present embodiment identifies to ancient books title, is referred to Above-mentioned Fig. 1 is to the content described in embodiment illustrated in fig. 4, and here is omitted.
In summary, the recognition methods of ancient books title and device that the embodiment of the present invention proposes, by obtaining ancient books text, Word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtains multiple participles;Based on pre-saving Title knowledge base, classify to each, wherein, class categories include:Title, piece name, dance accompanied by music name, title referred to as, Piece name abbreviation and title containing name;For each participle, when judging a classification that the participle is pertaining only in class categories, The participle is saved as to the target title of the ancient books text.This method can alleviate traditional time-consuming consumption of manual title mark Power, traditional machine mark the problem of effect is poor.The precision and quality of ancient books title identification are improved, so as to meet Study of Ancient Books Practical application request in publishing work.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.Need Illustrate, herein, such as first and second or the like relational terms be used merely to by an entity or operation with Another entity or operation make a distinction, and not necessarily require or imply between these entities or operation any this reality be present The relation or order on border.Moreover, term " comprising ", "comprising" or its any other variant are intended to the bag of nonexcludability Contain, so that process, method, article or equipment including a series of elements not only include those key elements, but also including The other element being not expressly set out, or also include for this process, method, article or the intrinsic key element of equipment. In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including the key element Process, method, other identical element also be present in article or equipment.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (10)

1. a kind of ancient books title recognition methods, it is characterised in that methods described includes:
Ancient books text is obtained, word segmentation processing is carried out to the ancient books text based on ancient books statistical model segmentation methods, obtained multiple Participle;
Based on the title knowledge base pre-saved, classify to each, wherein, class categories include:Title, piece name, Dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name;
, will when judging a classification that the participle is pertaining only in class categories and the category is not for piece name for each participle The participle saves as the target title of the ancient books text.
2. according to the method for claim 1, it is characterised in that methods described also includes:
For each participle, when judging that the class categories belonging to the participle are more than a classification, the participle is saved as and waits to mark Remember title;
Based on the title knowledge base pre-saved, judging that the title to be marked is the abbreviation of a target title, and belonging to When title+piece name abbreviation structure, the title to be marked is saved as into target title, and more fresh target title storehouse.
3. according to the method for claim 2, it is characterised in that methods described also includes:For remaining each to be marked Title, preserve the paragraph position that the title to be marked occurs in the ancient books text;
Based on the title knowledge base pre-saved, judging that the title to be marked is the abbreviation of a target title, and belonging to When author+title abbreviation, judge to preset whether line number characteristic character occurs before and after the paragraph position of the title to be marked;
When to be, the title to be marked is saved as into target title, and more fresh target title storehouse.
4. according to the method for claim 3, it is characterised in that the characteristic character includes:" saying ", " cloud ", " meaning " or “:”.
5. according to the method for claim 3, it is characterised in that methods described also includes:
For remaining each title to be marked, based on the title knowledge base pre-saved, judge that the title to be marked is treated with this Whether the content that line number is preset before and after the paragraph position of mark title forms name+title structure;
When to be, the title to be marked is saved as into target title, and more fresh target title storehouse.
6. according to the method for claim 5, it is characterised in that methods described also includes:
For remaining each title to be marked, screening includes the title to be marked of variant Chinese character or the complex form of Chinese characters;
The title to be marked for including variant Chinese character or the complex form of Chinese characters is matched with the title knowledge base pre-saved;
If the match is successful, the title to be marked is saved as into target title, and more fresh target title storehouse.
7. according to the method for claim 6, it is characterised in that methods described also includes:
For remaining each title to be marked, based on the title knowledge base pre-saved, judging that the title to be marked is one The abbreviation of the individual target title, and when belonging to title+piece name abbreviation structure, the title to be marked is saved as into target title, And more fresh target title storehouse.
8. according to any described methods of claim 1-7, it is characterised in that methods described also includes:
Each target title is marked out into punctuation marks used to enclose the title.
9. according to the method for claim 8, it is characterised in that the punctuation marks used to enclose the title are《》Or underscore.
10. a kind of ancient books title identification device, it is characterised in that described device includes:
Acquisition module, for obtaining ancient books text, the ancient books text is segmented based on ancient books statistical model segmentation methods Processing, obtains multiple participles;
Sort module, for based on the title knowledge base pre-saved, classifying to each, wherein, class categories bag Include:Title, piece name, dance accompanied by music name, title abbreviation, piece name abbreviation and title containing name;
Execution module, for each participle, judging a classification that the participle is pertaining only in class categories and the category is not During piece name, the participle is saved as to the target title of the ancient books text.
CN201711177794.9A 2017-11-22 2017-11-22 The recognition methods of ancient books title and device Pending CN107832303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711177794.9A CN107832303A (en) 2017-11-22 2017-11-22 The recognition methods of ancient books title and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711177794.9A CN107832303A (en) 2017-11-22 2017-11-22 The recognition methods of ancient books title and device

Publications (1)

Publication Number Publication Date
CN107832303A true CN107832303A (en) 2018-03-23

Family

ID=61653236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711177794.9A Pending CN107832303A (en) 2017-11-22 2017-11-22 The recognition methods of ancient books title and device

Country Status (1)

Country Link
CN (1) CN107832303A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875808A (en) * 2018-05-17 2018-11-23 延安职业技术学院 A kind of book classification method based on artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN206411669U (en) * 2016-08-31 2017-08-15 天津赛因哲信息技术有限公司 SaaS ancient book knowledge service cloud platform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN206411669U (en) * 2016-08-31 2017-08-15 天津赛因哲信息技术有限公司 SaaS ancient book knowledge service cloud platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875808A (en) * 2018-05-17 2018-11-23 延安职业技术学院 A kind of book classification method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN108804512B (en) Text classification model generation device and method and computer readable storage medium
CN107807987B (en) Character string classification method and system and character string classification equipment
US8751972B2 (en) Collaborative gesture-based input language
WO2016202101A1 (en) Method and device for displaying candidate item based on input method
CN108984530A (en) A kind of detection method and detection system of network sensitive content
WO2019041521A1 (en) Apparatus and method for extracting user keyword, and computer-readable storage medium
Anthony Visualisation in corpus-based discourse studies
CN108229651A (en) Neural network model moving method and system, electronic equipment, program and medium
CN104850617B (en) Short text processing method and processing device
CN105988665B (en) Information dubbing system, information duplicating method and electronic equipment
CN109062972A (en) Web page classification method, device and computer readable storage medium
CN107797990A (en) Method and apparatus for determining text core sentence
CN104142912A (en) Accurate corpus category marking method and device
CN107807958A (en) A kind of article list personalized recommendation method, electronic equipment and storage medium
CN110600094A (en) Intelligent writing method and system for electronic medical record
CA3065275A1 (en) Utilizing optical character recognition (ocr) to remove biasing
US8117237B2 (en) Optimized method and system for managing proper names to optimize the management and interrogation of databases
CN108009248A (en) A kind of data classification method and system
CN113704599A (en) Marketing conversion user prediction method and device and computer equipment
CN107168635A (en) Information demonstrating method and device
CN114937270A (en) Ancient book word processing method, ancient book word processing device and computer readable storage medium
CN111382243A (en) Text category matching method, text category matching device and terminal
CN107832303A (en) The recognition methods of ancient books title and device
CN110929507A (en) Text information processing method and device and storage medium
CN112883719A (en) Class word recognition method, model training method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180323

RJ01 Rejection of invention patent application after publication