CN108959626A - A kind of cross-platform efficient automatic generation method of isomeric data bulletin - Google Patents

A kind of cross-platform efficient automatic generation method of isomeric data bulletin Download PDF

Info

Publication number
CN108959626A
CN108959626A CN201810811216.4A CN201810811216A CN108959626A CN 108959626 A CN108959626 A CN 108959626A CN 201810811216 A CN201810811216 A CN 201810811216A CN 108959626 A CN108959626 A CN 108959626A
Authority
CN
China
Prior art keywords
class
token
data
bulletin
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810811216.4A
Other languages
Chinese (zh)
Other versions
CN108959626B (en
Inventor
尹健康
张卫东
宋红文
刘宁
洪海舟
贺红梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Tobacco Co Chengdu Co
Original Assignee
Sichuan Tobacco Co Chengdu Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Tobacco Co Chengdu Co filed Critical Sichuan Tobacco Co Chengdu Co
Priority to CN201810811216.4A priority Critical patent/CN108959626B/en
Publication of CN108959626A publication Critical patent/CN108959626A/en
Application granted granted Critical
Publication of CN108959626B publication Critical patent/CN108959626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the cross-platform efficient automatic generation methods of isomeric data bulletin, comprise the steps of, 1, the processing of magnanimity isomeric data, manage data using SX404DB key assignments type centralized database;SX404DB key assignments type database is the key assignments type NoSQL database based on Inverted Index Technique;2, bulletin content automatically generates, by DocumentScript Script controlling system come dynamic generation bulletin content;3, the Automatic Typesetting of presentation format is completed using to the format module injection content based on Office OpenXML and by the way of compressing it to DOCX format file.Previous bulletin generation method is compared, this method, which has, supports magnanimity isomeric data, and content generating mode is flexibly expansible, the feature that presentation format is stable and compatibility is high, and with good stability, operational and scalability.

Description

A kind of cross-platform efficient automatic generation method of isomeric data bulletin
Technical field
Present invention relates particularly to And Methods of Computer Date Processing, specially a kind of cross-platform isomeric data bulletin is efficiently automatic Generation method.
Background technique
Tobacco business is that a market structure and the huge industry of institutional framework are related to tobacco city in tobacco is managed The links such as field supervision, tobacco leaf production, tobacco marketing have large number of tobacco marketing site, therefore, for cigarette in the market The management of careless industry is always the huge work of workload.
In tobacco management, calculating data management is early had been achieved with, still, in management work, not only to be involved how to Each link process of specification also relates to how to provide for management unit and a reacts current tobacco business brief data strictly according to the facts Bulletin, for management decision-maker provide management reference.
Office automation always is a key areas of computer application, and electronic manuscript technology is that this field is very heavy The branch wanted.The electronic manuscript of narrow sense refers to reading on an electronic device, edits or the number of publication and printing Contribution, and the electronic manuscript of broad sense can refer to all multimedia digital documents.Electronics bulletin be one kind be dedicated as report say Bright special electronic manuscript is all widely applied in every field such as telecommunications, traffic, finance, education, geology.It is this Electronic manuscript is a kind of functional manuscript, and the electronic manuscript of similar function content and in form have greatly it is similar Place, or even sometimes except data, other content is almost.And bulletin is write and has often been spent largely Manual labor's time, the research work of bulletin automatic generation method are significantly.
Bulletin automatically generates work and can be divided into three levels: data organization, bulletin content generate, format setting. In traditional application system, data source often choice relation type database, and the height consistent affairs of relevant database Property and property easy to use are also unquestionable.But modern electronics bulletin is united with small-scale data mostly What meter was write again, but based on a large amount of multi-source heterogeneous data.Obviously traditional relevant database is automatic as modern bulletin The data source of generation system is unsuitable.Existing major part bulletin, which generates system for content generating mode, may be roughly divided into Two kinds.One is the contents for automatically generating bulletin from the beginning to the end using programming language;Another kind is the side replaced by placeholder Formula generates bulletin content, is briefly exactly to utilize template generation content.Because most of popular document format belongs to private There is format, so it is all to realize the maker-up of electronic document by way of calling third party API that previous bulletin, which generates system, Make, but often because system is not exclusively compatible or the not congruent reason of help document causes typesetting effect unsatisfactory.
Summary of the invention
To solve the above-mentioned problems, the present invention provides one kind, set forth herein one kind using NoSQL database as data source, leads to It crosses dynamic script and generates bulletin content, with the cross-platform isomeric data letter of the customization presentation format of Office OpenXML standard It offers a high price and imitates automatic generation method.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin of the invention, comprises the steps of,
(1), the processing of magnanimity isomeric data manages data using SX404DB key assignments type centralized database.It is described SX404DB key assignments type database is the key assignments type NoSQL database based on Inverted Index Technique.
(2), bulletin content automatically generates, by DocumentScript Script controlling system come in dynamic generation bulletin Hold.
(3), the Automatic Typesetting of presentation format injects content and general using to the format module based on Office OpenXML Its mode for being compressed to DOCX format file is completed.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Using SX404DB key assignments type centralized database manage data, specifically, creation one database Session session object to SX404DB database initiates a database session, specifically, creating an entity object first, entity pair is then arranged The coding of elephant, type, area, time several attributes.Then the query of session object is called using the combination condition as parameter Method inquires all entity objects for meeting corresponding conditions.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Data are managed using SX404DB key assignments type centralized database, specifically, by the following procedure packet that contains in SX404DB into Row: using program bag convertor, realizes the conversion of format between data object.Using program bag Directory, rope is realized Draw directory management function.Using program bag Index, inquiry and modification to index are realized.Using program bag Properties, Realize the configuration of database.
Using program bag Session, the management of accessing data base session.Using program bag Condition, data Condition function in operation.Using program bag Sort, the ranking function in data query is realized.
Wherein, user by include in Session class the save method of heavy duty, delete method, update method and Resource in query way access database, Session class and Searcher class, Processer class, DocumentConvertor class belongs to dependence,
Session class passes through Searcher class to data query, by Processer class to data modification.Pass through DocumentConvertor class realizes the conversion of data object format.
ConcurrentDirectory class and Searcher class, Processer class are paradigmatic relations, The object of ConcurrentDirectory class respectively appears in Searcher class and Processer class as an attribute.
Processer class provides the following method for calling: logic deletion is carried out to data with delete method, when Delete method is called, and the data operated enter recovery area, pass through clearTrash method clearing and retrieving area. ForceDelete method is a physics delet method, and data will be unable to restore by physics deletion.With insert method logarithm According to being added.It is modified with update method to data.
Entire SX404DB database is that each file path only provides a ConcurrentDirectory example, is led to Operation ConcurrentDirectory example is crossed to realize storage, inquiry, modification and delete, ConcurrentDirectory's Thread lock is read and write abruption, and all write operations are synchronized by the way of being lined up and executed.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Data are managed using SX404DB key assignments type centralized database, specifically, all complete data objects are all with Document Class is unit storage, and each Document object includes several Filed members.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described By DocumentScript Script controlling system come dynamic generation bulletin content, specifically include:
(1), source code is cut into several words by morphological analysis.
(2), the hierarchical logic relationship between each word is cleared in syntactic analysis, generate several abstract syntax tree.
(3), machine language is generated, machine language is generated using interpreted languages processor, by interpreter by abstract language Method tree explains and executes one by one, and feeds back the result of execution.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Morphological analysis is to realize that a Lexer class segmenter splits source code character string to become Token, passes through regular expression Match to complete participle function, specifically,
The Token is divided into character string word class, numerical value word class, identifier word class, end of file word class, End of file class Token is realized using in singleton pattern insertion Token class as a static member.
In Lexer class, it is equipped with comPat, numPat, strPat, idPat character string type field, is matching note respectively It releases, the regular expression of numerical value Token, character string Token, identifier Token, regexPat word is additionally provided in Lexer class Serial type field is accorded with, for legitimate characters string all in expression formula matching DocumentScript.In the grammer point of Lexer class Analysis process execute when, segmenter will read source code line by line, check one by one from left to right each row content whether with regexPat Match, and by all matched text string extractings.
Lexer object obtains source code by receiving Reader object.
During syntactic analysis after participle, Lexer class provides a peek method.Construct the mistake of abstract syntax tree Journey is the trace-back process of depth-first, and in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures, specifically To provide the buffering queue of a peek method and an interim Token of storage.When constructing abstract syntax tree, peek is first passed through Method know below will read Token and be deposited into buffering queue, then the content in buffer area is sentenced It is disconnected, Token is finally obtained by read method again and constructs abstract syntax tree.
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, when for It is empty then a Token is added into buffering queue, then the Token of buffering queue head is returned, and it is deleted from buffering queue It removes.Several Token of pre-read can be stored in buffering queue by the peek method when executing every time, and as letter Number return value returns, and does not delete any element in buffering queue.
Several abstract syntax tree of the generation, specifically, Token sequence is conformed to the principle of simplicity single line according to the syntax rule of language Property structure be assembled into tree structure, abstract syntax tree is defined as the interface of entitled ASTree, and is equipped with several realization ASTree The node class of interface.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Abstract syntax tree is explained and is executed one by one by interpreter, and feed back execution as a result, specifically, by interpreter to every One abstract syntax tree evaluation, the method for evaluation traversed in a recursive manner since root node entire abstract syntax tree up to Leaf node, each access node can have an evaluation return value, and in addition to leaf node, the return value of other nodes All rely on the return value of its child node.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Segmenter will read source code line by line, specifically, ought not include the blank character of front matched character string with comPat Match, then the character string is one section of annotation, when the character string is numerical value type-word face amount, character string literal or identifier, then It is matched with numPat, strPat or idPat.Being determined the Token of type, to be stored into Token queue to be returned, continues to use phase Same method handles remaining part, constantly repeats, until source code terminates, source code can be split into one by Lexer Token queue.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, if described The dry node class for realizing ASTree interface can be divided into leaf node and non-leaf nodes.Leaf node no longer includes child node, Non-leaf nodes may include child node.Leaf node include four classes: Name class, NumberLiteral class, StringLiteral class and NullStmnt class are all inherited in leaf node class, and non-leaf nodes is all inherited in non-leaf nodes Class, non-leaf nodes can be divided into three classes: the first kind is used as the Row control of program, and this kind includes sequential organization class, uses Make the branched structure class of the processing of expression formula and the control loop structure class as function.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described Lexer object obtains source code by receiving Reader object, has specifically included two methods of read and peek, described Read method obtains Token since the head of source code one by one, can return to a new Token when called every time.It is described Peek method will return to i-th after the Token that read method will return for pre-read Token, peek (i) Token.Source code reading finishes, and read method and peek method will all return to Token.EOF.
Beneficial effects of the present invention:
Previous bulletin generation method is compared, this method, which has, supports magnanimity isomeric data, content generating mode flexible Feature expansible, that presentation format is stable and compatibility is high.The system is with good stability, operational and scalability.
NoSQL database SX404DB based on inverted index of the invention is a kind of key assignments type NoSQL database, it with Management data are described using Star Model, with the help of it, developer need not be cumbersome again based on Lucene index Build table work and database compatibling problem and work with one's mind laborious.Because at all without table in SX404DB, all data are adopted With spider management, both without hierarchical relationship or without the constraint relationship between data.The database is to structuring, semi-structured It may be implemented easily to manage with unstructured data, or even support being directly accessed for Java object.
The present invention realizes a kind of content injection method of the DOCX document of fixed format.
During bulletin is write, format setting is one and spends the time and undergo more work, and this part work Work can transfer to computer to automatically process completely.Reticle of the DOCX format as Microsoft Office office documents Formula has high occupation rate of market in office documents field, and the document content method for implanting realized herein is aiming at this lattice What formula was realized.In the treatment process of this method, it need to only be injected according to Office OpenXML standard into template in corresponding Hold, can easily realize the unification of document format.It is a large amount of that the use of this method can help the writer of bulletin to save The format setting time.
The present invention devises a kind of scripting language DocumentScript automatically generated for document content, and in Java The exploitation of the processing platform of the language is completed under environment.
The content of bulletin automatically generates often by the way of tag replacement, and herein using the side of Script controlling Formula.Can be more flexible with the document content that the method generates, not only support the writing of sequential file, but also may be implemented The condition of content generates or circulation generates.And DocumentScript is a kind of scripting language, i.e., it should not be for specific flat Platform compiles the interpreted languages that can be executed, and content generation is controlled by it can have better professional platform independence and can expand Malleability.
Specific embodiment
SX404DB of the invention is the key assignments type NoSQL number based on Inverted Index Technique for applying Java language exploitation According to library.It realizes ORM mechanism based on the inverted index that Lucene is established, through dynamic proxy technology, it is final realize with Java object is the data storage and access function of unit.
All database manipulations are all based on a database session in SX404DB, so before accessing database first Create a database session object, i.e. Session object.By a Session object, system can be with very comfortable Access and the entire SX404DB database of management
The example of data deposit: creating a tobacco marketing first, total several classes of (such can be any meet JavaBean specification user-defined class) entity object, it is then that id, type, province and time of the object is several The value of a attribute is set as " 1 ", " Chengdu tobacco ", " Chengdu " and " 2016-03-01 ", finally calls the save of session object Method completes the deposit of the object.
The example of data query: MultiCondition (combination condition) object is created first, then creates two A simple condition (province value is " Chengdu ", and time value is " 2016-03-01 "), and by the two simple conditions with necessary Condition form (ConditionOccur.MUST indicate the condition be in query composition be must condition) be added to before The combination condition of creation is finally inquired using the combination condition as the query method of parameter calling session object all full The tobacco marketing sum object of sufficient corresponding conditions.
The example of data modification: the simple condition that an id value is " 1 " is created first, then creates a province Value is the modification item in " Hunan ", finally mutually calls session object with modification with the querying condition created before for parameter The province value for the tobacco marketing sum object that id values all in database are " 1 " is all changed to " Hunan " by update method.
The example that data are deleted: creating the simple condition that an id value is " 1 " first, is then ginseng with the simple condition Number calls the delete method of session object to delete the tobacco marketing sum object that id values all in database are " 1 ".
Contain 7 program bags in SX404DB, each program bag has undertaken different function task, has been described as follows table institute Show.Wherein convertor packet, directory packet, index packet and properties packet are 4 packets towards bottom, are born respectively Blame format conversion function, index list management function, the query modification function of index and the base of database between data object This configuration feature.Session packet, condition packet and sort packet are the packets of 3 application-oriented layers, they are each responsible for data The ranking function in condition function and data query in the access function in library, data manipulation.These three packet in class be mainly Third-party application accesses database and provides.
Program bag Explanation
cn.edu.cug.sx404.database.condition It realizes to inquiry, modification and the management for deleting condition
cn.edu.cug.sx404.database.convertor Realize the conversion of format between data object
cn.edu.cug.sx404.database.directory Realize the management of index list
cn.edu.cug.sx404.database.index Realize the inquiry and modification to index
cn.edu.cug.sx404.database.properties Realize the configuration of database
cn.edu.cug.sx404.database.session The management of accessing data base session
cn.edu.cug.sx404.database.sort Realize the ranking function in data query
The relationship model of SX404DB major function class: there are save method, the side delete of several heavy duties in Session class Method, update method and query method, user can access the resource in database by these methods.Session Class and Searcher class, Processer class, DocumentConvertor class belong to a kind of dependence, that is to say, that Session class be respectively to the inquiries of data and modification operation realized by Searcher class and Processer class, and Data object format conversion function is realized by DocumentConvertor class.ConcurrentDirectory class It is a kind of paradigmatic relation with Searcher class, Processer class, in other words, the object of ConcurrentDirectory class It is respectively appeared in Searcher class and Processer class as an attribute.
The increasing of data, to delete, change, looking into be the most basic function of database, these basic operations of the bottom data of SX404DB Function is that the Inverted Index Technique based on Lucene frame is developed, in this inverted index frame, all complete data Object is stored as unit of Document class.It (is simply exactly to accord with that the basic storage cell of database, which is JavaBean, Close the Java entity class of Object-Oriented Design principle), but what is really stored in database index is not JavaBean object, But it is able to reflect the Document object of JavaBean characteristics of objects.Each Document object includes several Filed member, Document object also exactly reflect the feature of JavaBean object by these Field members.
The index that Lucene is created from logical construction is made of Segment, Document, Field, Term.It is a Lucene index is made of a series of files, these files have different filename prefix or suffix, their specific function The function description of subfile in table index file can be described as follows:
In SX404DB, access bottom Document be realized by Searcher class and Processer class, wherein Searcher class provides the query function of data, and Processer class provides the addition, modification and deletion function of data.
Searcher class provides 3 methods for calling, and the summary of these methods is as shown in table 3.1.3. GetInstance method is a static method, the example for creating Searcher class.Search method is for data Inquiry, this method are performed, and will return to a Document sequence according to specified querying condition and ordering rule. The function of getDocumentByDocID method is that Document object is inquired according to ID.It is as follows:
Processer class provides 8 methods for calling, and the summary of these methods is as shown in the table. GetInstance method is one for creating the static method of Processer class example.Delete method is for data Logic is deleted, and when this method is called, the data operated will enter recovery area, and the cleaning of recovery area then passes through ClearTrash method is realized.ForceDelete method is a physics delet method, once data will by physics deletion It can not restore.Insert method and update method are respectively used to the addition and modification of data.Processer class method summary Table:
Searcher and Processer is actually also to have invoked three primary ropes of Lucene to the access of bottom data Draw access class: IndexWriter, IndexReader and IndexSearcher.It can be by this according to the property to index operation Three classes are divided into read operation class and write operation class.IndexReader and IndexSearcher belongs to the read operation class of index, IndexReader can extract corresponding Document according to unique ID from index, and IndexSearcher can root The ID set of corresponding Document is inquired according to specified querying condition.IndexWriter belongs to the write operation class of index, can To provide the function of write-in index for caller, when user needs to increase data, deletes data or modification data, it is ok It is utilized to reorganize and indexes and write index such as disk or memory.
The encapsulation and conversion of data object
The storage of SX404DB bottom is largely isolated key-value pair, and Lucene frame also only merely provides The access facility of Document object, so to realize being directly accessed function and just must first realizing for JavaBean object The conversion function between JavaBean object of Document object.
The conversion function of SX404DB data object is mainly realized by DocumentConvertor class, and the realization of this class What is used is the design philosophy of ORM.Here ORM is the abbreviation of Object Relational Mapping, represents object pass System's mapping.DocumentConvertor provide convert2Document, convert2Obj, convert2Query, Several method for converting types such as castToQuery, castToField and cast.Cast completes basic data type (such as Int, long, double, float etc.) and String type between conversion work.CastToQuery and castToField Complete the conversion work by key-value pair Query and Field into Lucene.Convert2Documen and convert2Obj are complete At the conversion work between JavaBean and Document.Convert2Query completes the conversion from JavaBean to Query Work.
In order to improve the efficiency of entire database, SX404DB is using a kind of Lazy data loading mode, that is to say, that Data in entity object are that instant access loads immediately.This data loading mode may also be referred to as dynamic data load Mode, what its realization was realized often by dynamic proxy technology.In order to facilitate realization proxy mode Java language itself Provide a set of dynamic proxy mechanism.But the realization of this dynamic proxy mechanism must assure that proxy class and realize that class is all real Existing same interface.This dynamic proxy implementation can design to application layer and bring many extra works, and interface is once It designs, the later period is difficult to extend again.It is a kind of more flexible dynamic to be that SX404DB is realized herein by third party's class libraries CGLIB State data loading mode.In this mode, all entity objects obtained by data base querying are added by database Work, the attribute in object is all null value before accessed, only the ability dynamic call data load mechanism in access, and And these objects do not need to realize any predefined interface.CGLIB is that a high performance bytecode generates class Library, so by the class that it can go modification to define from bytecode level, to realize unrelated with any interface move State agency mechanism.
The processing of thread-safe problem:
Just the thread-safe problem under distributed environment is considered at the beginning of the Frame Design of Lucene, so in Lucene Inherently there is sound lock mechanisms in portion.But the lock in Lucene is a kind of fine-grained lock, can only guarantee reading and writing data The normal operation of operation.In SX404DB, the basic logic unit of storage is not key-value pair, but JavaBean object, Storage for this big granularity object, basic Lucene lock mechanism, which not can guarantee not, there is dirty reading phenomenon.
In order to avoid the conflict between thread, Java provides two kinds of thread synchronization mechanisms.The first is to use Synchronized key will need synchronous code block to wrap up, and second is to realize thread synchronization by thread lock. The thread lock Mechanism Design that SX404DB is provided using Java language itself the directory management mechanism of a set of thread-safe.Entirely Database is that each file path only provides a ConcurrentDirectory example, all storages, inquiry, modification It with function is deleted is realized by operating ConcurrentDirectory example.In order to improve the performance of database, The thread lock of ConcurrentDirectory using read and write abruption mode.In the thread latching mode of this read and write abruption Under, all read operations can asynchronous execution, but all write operations synchronous by the way of being lined up must execute.In this way One reads data manipulation not only ensure that the thread-safe under multi-thread environment also avoids unnecessary queuing, thus significantly Improve the performance of database.
Can easily find to spend during many bulletins are write time most work be data statistics and Format setting, in order to simplify this work, this paper presents a kind of method to DOCX format file insertion content, this sides Method mainly includes three steps.
1, OOXML template is formulated.
2, information is extracted from the Java entity class for carrying information, be injected into template.
3, the template comprising information is compressed into the document for browsing and issuing.
Template in the first step is write by OOXML specification.One simplest OOXML document includes three portions Point: relationship map part, content type definition part and body matter part.Most important one body matter part is to use WordprocessingML language is recorded in "/document.xml " file.WordprocessingML language is to follow XML Linguistic norm, be a kind of markup language.All labels defined in the language are started with w (such as < w:document >), most of label is that occur in pairs of form, and a distinguishing label is that occur with single label form.Entire document content is all It is filled out<w:document></w:document>in, wherein including a pair<w:body></w:body>label,<w: Body></w:body>again comprising several right<w:p></w:p>label.It is each pair of<w:p></w:p>all indicate a paragraph, it is intermediate It is several right to may include<w:r></w:r>label can also inside add one to describe paragraph style<w:pPr/> Label.It is each pair of<w:r></w:r>indicate a string of continuous characters, the inside may include one to describe character string pattern< W:rPr/>label and a pair store character string<w:t></w:t>label.
Work after the completion of stencil design is that document content is injected into template, will be used for template content injection herein Class is defined as TemplateUtil.Template content function of injecting is by calling the insert method of TemplateUtil come real Existing, the realization process of the function has three steps.The first step reads in all information in template in memory in the form of character string. Second step searches label from the character string for be mounted with template.The process of this label lookup can pass through regular expression With completing, expression formula " $ { [^ } ^ | | ^ $]+ " characters of all " $ { ... } " forms can be matched String.Third step will be mounted with tag replacement all in the character string of template into label mapped document content, and will be complete Template file is re-write at the character string of replacement.
Last procedure of entire DOCX format file content insertion work is exactly the packing of OOXML file.In OOXML In format standard, the document of DOCX format be all according to OPC pact with Unicode coding and ZIP format compression made of.This Text will be in the realization write-in ZipCompressor class of ZIP compression function.For logically, a DOCX document is exactly one OPC packet, this packet are the set of each section of complete set again.Each part by a case-insensitive pathname, This pathname is the character string for dividing section name with left slash "/" shaped like "/pres/slides/slide1.xml ";And it is every There is its specific content type in a part.For physical structure, encapsulated by OPC pact ZIP file namely one OPC packet, a part in the corresponding packet of each ZIP file item, and the pathname also path famous prime minister one with the part in packet It causes.In this OPC packet, "/[Content_Types] .xml " is used to define the content type of various pieces.It is every in packet There is also clear mapping relations between a part.This series of mapping relations is all stored in the portion relationships Point.All mapping relations parts are all with the form name of " .../_ rels/ ... .rels ";As a part path it is entitled "/ A/b/c.xml ", then its mapping relations pathname is then "/a/b/_rels/c.xml.rels ".It is main in entire packet Document content be to be recorded in the part document, this part with "/document.xml " this file record text The main contents of shelves.
The a set of scripting language DocumentScript's designed and developed herein for document content automatic generation function sets It counts principle and realizes process.
The treatment process of language processor
The treatment process of language processor undergoes three basic steps: morphological analysis, syntactic analysis, execution (or generate machine Device language).Morphological analysis can be referred to as again to be segmented, i.e., source code is cut into several words (Token);It next is exactly language Method analysis, so-called syntactic analysis refer to the hierarchical logic relationship cleared between each word, generate several abstract syntax tree (AST, Abstact Syntax Tree).(or generating machine language) is executed to explain abstract syntax tree one by one by interpreter And execute, finally feed back the result of execution.
Segmenter design
Realize that a language processor first step is to realize a segmenter (Lexer).One does not add the program of processing Source code can be regarded as a long character string, this character string is made of a succession of short character strings.The effect of segmenter is just It is that this character string grown splits into Token one by one by source code.
The Token of DocumentScript language is segmented into four classes: character string word, numerical value word, identifier word With end of file word.Character string word and numerical value word are it is well understood that be exactly the character sequence for representing character string and numerical value Column.But both ends be added to quotation marks (") shaped like " 123 " character string, even if quotation marks middle section can indicate a number Word, nor a numerical value word, but character string word.Identifier word is exactly some keywords used in program, big Bracket " { } ", bracket " [] ", round bracket " () ", branch ";" and variable name etc..In addition to three of the above has practical significance Token, also define special end of file (EOF) word, in DocumentScript herein for identifying The end of code file.
Word herein is defined as an abstract class Token, its field and method design is as shown in the table respectively.Its Middle EOF indicates the Token of end of file, and EOL is then line feed character defined in DocumentScript.isNumber, Tri- methods of isIdentifier, isString be for judging whether it is numerical value, character string or the word of type of identifier, The concrete type of Token can be determined by these three methods.The function of getText method and getNumber method is then to use To return to the numerical value and character string in Token object.In addition to getLineNumber method in Token class, other methods are all Abstract method, these abstract methods are realized one by one in its subclass.
Token class field summary table:
Token class method summary table:
Based on parent Token, three subclasses: StrToken, NumToken, IdToken are defined herein.Their generations respectively Table character string word, numerical value word and identifier word.End of file (EOF) Token, because function is only authentication code The end of file, and structure is simple, so there is no the subclasses as Token to realize on external file, but with singleton mould It is realized in formula insertion Token class as a static member.
One section of complete program code can split into above-mentioned four kinds of Token sequences, and this splits work and then transfers to Segmenter (lexical analyzer) is completed.Segmenter is defined as Lexer class herein, it is to pass through to the realization of participle function Regular expression matching is completed.
In Lexer class, there are five character string type fields.ComPat, numPat, strPat, idPat be respectively for Regular expression with annotation, numerical value Token, character string Token, identifier Token, and regexPat expression formula then can be with Match legitimate characters string all in DocumentScript.When the parsing process of Lexer class executes, segmenter will Source code is read line by line, checks whether each row content matches with regexPat one by one from left to right, and by all matched characters String extracts.If matched character string (not including the blank character of front) matches with comPat, illustrate that the character string is one section Annotation.It, can be with numPat, strPat if the character string is numerical value type-word face amount, character string literal or identifier Or idPat matching.Being determined the Token of type, to be stored into Token queue to be returned.It is further continued for later using identical method Remaining part is handled, constantly repeats down, until source code terminates, source code can be split into a Token by Lexer Queue.In the Lexer class of this paper, above-mentioned such process is then mainly by readLine method and addToken method It realizes.
Lexer class field summary table
Lexer class method summary is as shown in the table.Lexer class has the building method an of type parameter containing Reader, Lexer object obtains source code by receiving Reader object.Two sides read and peek are also defined in Lexer class Method, the process of morphological analysis are driven by the two methods.Read method can obtain one by one since the head of source code Token is taken, a new Token can be returned when called every time.Peek method is then for pre-read Token, peek (i) I-th of Token after the Token that return read method will be returned.If source code reading finish, read method and Peek method will all return to Token.EOF.
Lexer class method summary table:
If only simple participle, Lexer class provide a read method and can fully achieve.But it is segmenting Later during syntactic analysis, Lexer class just needs in addition to provide a peek method.Syntactic analysis is an one side Obtain the process that Token constructs abstract syntax tree on one side.And the process of this construction syntax tree is the backtracking of a depth-first Process, in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures.In order to support returning in this process It traces back, method are as follows:
The buffering queue of one peek method and an interim Token of storage is provided.When constructing abstract syntax tree, first pass through Peek method know below will read Token and be deposited into buffering queue, then in buffer area content carry out Judgement finally obtains Token by read method again and constructs abstract syntax tree.
Construct the abstract syntax tree method performance table of comparisons:
Method Time complexity Space complexity
Method one O(n2) O(1)
Method two O(n) O(n)
Method three O(n) O (1)~O (n)
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, if A Token is then added into buffering queue to be empty, finally again returns to the Token of buffering queue head, and by it from buffering team It is deleted in column.But several Token of pre-read can be stored in buffering queue when executing every time by peek method, and by its It is returned as function return value, without will be deleted any element in buffering queue.And read method and peak method are to buffering The operation of queue addition Token relies on fillQueue method all to realize.FillQueue method has an int type Parameter and a boolean type return value, its parameter indicates to read in the number of Token, and return value then indicates buffering team Whether column filling succeeds.The return value of ordinary circumstance this method is all true, and only program code all reads to finish and can just return Return false.
The design of syntax analyzer
Work after the completion of morphological analysis is building abstract syntax tree, and the process of syntactic analysis is namely by Token sequence Column are assembled into tree structure from simple linear structure according to the syntax rule of language, are in brief exactly to construct abstract syntax tree.It takes out As syntax tree is defined as the interface of an entitled ASTree, method summary is as shown in the table.
ASTree interface method summary table:
ASTree is the interface definition to abstract syntax tree rather than specific class, so in order to completely describe one Abstract syntax tree, also designs large quantities of node classes for realizing ASTree interface, their function declaration such as following table institute herein Show.
Abstract syntax tree node type explanation:
The realization node class of abstract syntax tree can be divided into leaf node and non-leaf nodes, they are defined respectively herein For ASTLeaf class and ASTList class.As its name suggests, leaf node no longer includes child node, and non-leaf nodes may include son Node.Leaf node includes four classes: Name class, NumberLiteral class, StringLiteral class and NullStmnt class. They are all inherited in ASTLeaf class.Non-leaf nodes is all inherited in ASTList class.These non-leaf nodes can be divided into three Class: the first kind is used as the Row control of program, and this kind includes BlockStmnt class, IfStmnt class and WhileStmnt class, They have respectively represented sequential organization, branched structure and loop structure;Second class is used as the processing of expression formula;Third class is used as The control of function.
The design of interpreter:
Work after the completion of syntactic analysis is that the program of interpreter executes work.And after the completion of abstract syntax tree building, With regard to fairly simple, interpreter only need to be to each abstract syntax tree evaluation for the execution of program.And the method for this evaluation It is to traverse entire abstract syntax tree in a recursive manner since root node up to leaf node.Each access node An evaluation return value is had, and in addition to leaf node, the return value of other nodes all relies on the return value of its child node.
If will be according to abstract syntax tree come evaluation, class corresponding to each node object of abstract syntax tree must have A standby evaluation technique.This evaluation technique eval is defined in ASTree herein, form of Definition such as public Abstract Object eval (Environment env), all successions must all realize this method in the subclass of ASTree. Therefore, as long as calling the eval method of the root node object of abstract syntax tree, the corresponding journey of the syntax tree can completely be executed Sequence.
The DocumentScript language of this paper is a kind of scripting language for supporting variable-definition, so can be related to variable Scope, therefore environmental objects can be passed to eval method when being executed.In short, environmental objects are exactly that one kind is used for The data structure of the corresponding relationship of record variable title and variate-value, is defined as Evironment interface herein.Work as program When adding new variables, the key-value pair being made of the title of the variable and initial value will be added in current environment object, later If reuse the variable, program will be taken out variate-value from the environmental objects.If to be assigned again to same target New value is given, then needs the domain for first finding the variable, variate-value is updated to the environmental objects of specified domain.
The realization of Evironment interface is completed by BasicEnv class.In BasicEnv class, values object It is a HashMap, for completing the storage work of key-value pair;Outer object be current environment father's environment, subenvironment with His father's environment is a kind of inheritance, i.e., variable defined in the accessible father's environment of subenvironment, father's environment cannot access son Variable defined in environment.The mentality of designing of variable assignments and value function is very simple, it be by put and get method come It realizes.It should be noted that put method and putNew method are distinguishing.PutNew method is in current environment object In directly add or modify variable.And put method is then the domain first judged to performance variable, if the variable is at it It is defined in father's environment, then modifies the variate-value in his father's environment, if the variable is defined in current environment or not in office It was defined in what environment, then calls the addition of putNew method or modification variable in current environment.Wherein search variable-definition The function in domain transfers to where method to complete, and whether its implementation procedure is first to judge the variable in current environment quilt Definition, if be not defined in current environment, continuing the where method in recursive call father's environment to judge the variable is It is no to be defined in his father's environment, if the variable is not defined in current environment and current environment does not have father's environment, return null。
It is an abstract class that ASTree is also mentioned in front, and internal eval method is simultaneously not implemented, so specific evaluation Method is realized in subclass.
In numerous subclasses of ASTree, Name class, NumberLiteral class, StringLiteral class and NullStmnt class belongs to leaf node class, the i.e. subclass of ASTLeaf.Because Name indicates the customized variable of user, Its eval method implementation procedure is the value from environmental objects, is dished out if the definition that the variable is not present in environmental objects One exception.NumberLiteral class and StringLiteral class belong to the node class of literal type, so eval method The basic value in its Token is only returned, therefore it realizes that process is relatively simple, so repeating no more.NullStmnt What class represented is null statement, without return value, so there is no the specific implementations of eval, but directly inherits ASTLeaf's Eval method, the throw exception if its eval method is called.
All streams in order to realize the sequence of program, branch and the Row control function of circulation, in non-leaf nodes class Process control class also all realizes eval method.
In the eval method implementation procedure of IfStmnt class, the Rule of judgment (condition) of existing object is first carried out Eval method, return value executes the eval method of code block (thenBlock) certainly if it is true, if it is returned Value is false and current IfStmnt object contains negative code block (elseBlock), then executes the side eval of negative code block Method.Here Rule of judgment, certainly code block and negative code block is by condition method, the side thenBlock respectively What method and elseBlock method obtained, and the return value of these methods is by calling child (0), child respectively (1) and child (2) obtain.
The eval method implementation procedure of WhileStmnt class is a circulation, and circulation is first to its cycling condition every time (condition) evaluation is carried out, the eval method of loop body (body) is executed if return value is true, otherwise jumps out and follows Ring.Its cycling condition and loop body are obtained by calling condition method and body method respectively.
BlockStmnt class only represents a code block, so the execution that its evaluation process is also sequence wherein includes All nodes eval method, without other complex logics, therefore the specific implementation of its eval method repeats no more herein.
It is well known that expression formula necessarily has its return value, so all expression processing classes in non-leaf nodes class are all Realize its eval method.The range of definition of expression formula itself can be common literal, be also possible to transport than broad Operator expression formula.The evaluation technique of numeric type literal and character string type literal was introduced in front, herein will no longer It repeats, the eval method that operator expression is explained below is realized.
Operation expression can be divided into unary operation expression formula and binocular operation expression.In unary operation expression formula, DocumentScript only realizes the evaluation technique of two kinds of expression formulas of negative Expression formula and negative value expression formula, their realization side Method is relatively simple, only need to script numerical value be negated or be taken negative operation to return again to, so its realization repeats no more.But it is worth It is noted that the type for needing to treat operand before executing unary operation needs to verify, take negative operation can only logarithm Type Object Operations, and negated operation can only be to integer Object Operations.
Binary operator is defined herein as BinaryExpr class.In BinaryExpr class, left method, The effect of right method and operator method is to return to left operand, right operand and operator respectively.When eval method When execution, program will first judge that current binocular operation expression is that assignment type or calculation type are then adjusted if it is assignment type With computeAssign method, computeOp method is then called if it is calculation type.And the execution of computeOp method Cheng Ze is that the type for first judging left and right operand is then directly handled by current method if it is character string type, if it is numerical value Type then transfers to computeNumber method to handle.
Above embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.

Claims (10)

1. a kind of cross-platform efficient automatic generation method of isomeric data bulletin, which is characterized in that it comprises the steps of,
(1), the processing of magnanimity isomeric data manages data using SX404DB key assignments type centralized database;The SX404DB key Value type database is the key assignments type NoSQL database based on Inverted Index Technique;
(2), bulletin content automatically generates, by DocumentScript Script controlling system come dynamic generation bulletin content;
(3), the Automatic Typesetting of presentation format content and is pressed using to the format module injection based on Office OpenXML The mode for being reduced to DOCX format file is completed.
2. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described Using SX404DB key assignments type centralized database manage data, specifically, creation one database Session session object to SX404DB database initiates a database session, specifically, creating an entity object first, entity pair is then arranged The coding of elephant, type, area, time several attributes;Then the query of session object is called using the combination condition as parameter Method inquires all entity objects for meeting corresponding conditions.
3. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described Data are managed using SX404DB key assignments type centralized database, specifically, by the following procedure packet that contains in SX404DB into Row:
Using program bag convertor, the conversion of format between data object is realized;
Using program bag Directory, index list management function is realized;
Using program bag Index, inquiry and modification to index are realized;
Using program bag Properties, the configuration of database is realized;
Using program bag Session, the management of accessing data base session;
Condition function using program bag Condition, in data manipulation;
Using program bag Sort, the ranking function in data query is realized;
Wherein, user is by including heavily loaded save method, delete method, update method and query in Session class Resource in way access database, Session class and Searcher class, Processer class, DocumentConvertor class Belong to dependence;
Session class passes through Searcher class to data query, by Processer class to data modification;Pass through DocumentConvertor class realizes the conversion of data object format;
ConcurrentDirectory class and Searcher class, Processer class are paradigmatic relations; The object of ConcurrentDirectory class respectively appears in Searcher class and Processer class as an attribute;
Processer class provides the following method for calling: logic deletion is carried out to data with delete method, when Delete method is called, and the data operated enter recovery area, pass through clearTrash method clearing and retrieving area; ForceDelete method is a physics delet method, and data will be unable to restore by physics deletion;With insert method to data It is added;It is modified with update method to data;
Entire SX404DB database is that each file path only provides a ConcurrentDirectory example, passes through behaviour Make ConcurrentDirectory example to realize storage, inquiry, modification and delete;The thread lock of ConcurrentDirectory For read and write abruption, and all write operations are synchronous by the way of being lined up executes.
4. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described Using SX404DB key assignments type centralized database manage data, specifically, all complete data objects are all with Document Class is unit storage, and each Document object includes several Filed members.
5. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described By DocumentScript Script controlling system come dynamic generation bulletin content, specifically include:
(1), source code is cut into several words by morphological analysis;
(2), the hierarchical logic relationship between each word is cleared in syntactic analysis, generate several abstract syntax tree;
(3), machine language is generated, machine language is generated using interpreted languages processor, by interpreter by abstract syntax tree It explains and executes one by one, and feed back the result of execution.
6. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 5, which is characterized in that described Morphological analysis be realize a Lexer class segmenter by source code character string split become Token, pass through regular expression Match to complete participle function, specifically,
The Token is divided into character string word class, numerical value word class, identifier word class, end of file word class, file End mark class Token is realized using in singleton pattern insertion Token class as a static member;
In Lexer class, it is equipped with comPat, numPat, strPat, idPat character string type field, is matching annotation, number respectively The regular expression of value Token, character string Token, identifier Token are additionally provided with regexPat character string type in Lexer class Field, for legitimate characters string all in expression formula matching DocumentScript;It is held in the parsing process of Lexer class When row, segmenter will read source code line by line, check whether each row content matches with regexPat one by one from left to right, and by institute There is matched text string extracting;
Lexer object obtains source code by receiving Reader object;
During syntactic analysis after participle, Lexer class provides a peek method;Construction abstract syntax tree process be The trace-back process of depth-first, in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures, specifically, mentioning For the buffering queue of a peek method and an interim Token of storage;When constructing abstract syntax tree, first passes through peek method and obtain Know below will read Token and be deposited into buffering queue, then the content in buffer area is judged, finally again Token, which is obtained, by read method constructs abstract syntax tree;
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, when for it is empty then A Token is added into buffering queue, then the Token of buffering queue head is returned, and it is deleted from buffering queue;Institute Stating can be stored in several Token of pre-read in buffering queue when peek method executes every time, and return as function Value returns, and does not delete any element in buffering queue;
Several abstract syntax tree of the generation, specifically, by Token sequence according to the syntax rule of language from simple linear knot Structure is assembled into tree structure, and abstract syntax tree is defined as the interface of entitled ASTree, and is equipped with several realization ASTree interfaces Node class.
7. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 5, which is characterized in that described Abstract syntax tree is explained and is executed one by one by interpreter, and feed back execution as a result, specifically, by interpreter to every One abstract syntax tree evaluation, the method for evaluation traverse entire abstract syntax tree up to leaf in a recursive manner since root node Child node, each access node can have an evaluation return value, and in addition to leaf node, the return value of other nodes all according to Rely the return value in its child node.
8. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that described Segmenter will read source code line by line, specifically, ought not include the blank character of front matched character string with comPat Match, then the character string is one section of annotation, when the character string is numerical value type-word face amount, character string literal or identifier, then It is matched with numPat, strPat or idPat;Being determined the Token of type, to be stored into Token queue to be returned, continues to use phase Same method handles remaining part, constantly repeats, until source code terminates, source code can be split into one by Lexer Token queue.
9. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that described Several node classes for realizing ASTree interface can be divided into leaf node and non-leaf nodes;Leaf node no longer includes sub- section Point, non-leaf nodes may include child node;Leaf node include four classes: Name class, NumberLiteral class, StringLiteral class and NullStmnt class are all inherited in leaf node class, and non-leaf nodes is all inherited in non-leaf nodes Class, non-leaf nodes can be divided into three classes: the first kind is used as the Row control of program, and this kind includes sequential organization class, uses Make the branched structure class of the processing of expression formula and the control loop structure class as function.
10. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that institute The Lexer object stated obtains source code by receiving Reader object, has specifically included two methods of read and peek, described Read method obtains Token since the head of source code one by one, can return to a new Token when called every time;It is described Peek method is used for pre-read Token, i-th of Token after the Token that peek (i) will return to return read method; Source code reading finishes, and read method and peek method will all return to Token.EOF.
CN201810811216.4A 2018-07-23 2018-07-23 Efficient automatic generation method for cross-platform heterogeneous data profile Active CN108959626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810811216.4A CN108959626B (en) 2018-07-23 2018-07-23 Efficient automatic generation method for cross-platform heterogeneous data profile

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810811216.4A CN108959626B (en) 2018-07-23 2018-07-23 Efficient automatic generation method for cross-platform heterogeneous data profile

Publications (2)

Publication Number Publication Date
CN108959626A true CN108959626A (en) 2018-12-07
CN108959626B CN108959626B (en) 2023-06-13

Family

ID=64464317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810811216.4A Active CN108959626B (en) 2018-07-23 2018-07-23 Efficient automatic generation method for cross-platform heterogeneous data profile

Country Status (1)

Country Link
CN (1) CN108959626B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977547A (en) * 2019-03-27 2019-07-05 北京金和网络股份有限公司 Big data bulletin generation method based on dynamic modeling
CN110610068A (en) * 2019-09-16 2019-12-24 郑州昂视信息科技有限公司 Method and device for application isomerization
CN111143403A (en) * 2019-12-10 2020-05-12 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111539200A (en) * 2020-04-22 2020-08-14 北京字节跳动网络技术有限公司 Method, device, medium and electronic equipment for generating rich text
CN116450747A (en) * 2023-06-16 2023-07-18 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477568A (en) * 2009-02-12 2009-07-08 清华大学 Integrated retrieval method for structured data and non-structured data
CN104615526A (en) * 2014-12-05 2015-05-13 北京航空航天大学 Monitoring system of large data platform
CN105468571A (en) * 2015-11-19 2016-04-06 中国地质大学(武汉) Method and device used for automatically generating report
CN105912633A (en) * 2016-04-11 2016-08-31 上海大学 Sparse sample-oriented focus type Web information extraction system and method
EP3107014A1 (en) * 2015-06-15 2016-12-21 Palantir Technologies, Inc. Data aggregation and analysis system
CN106484767A (en) * 2016-09-08 2017-03-08 中国科学院信息工程研究所 A kind of event extraction method across media
CN106649455A (en) * 2016-09-24 2017-05-10 孙燕群 Big data development standardized systematic classification and command set system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477568A (en) * 2009-02-12 2009-07-08 清华大学 Integrated retrieval method for structured data and non-structured data
CN104615526A (en) * 2014-12-05 2015-05-13 北京航空航天大学 Monitoring system of large data platform
EP3107014A1 (en) * 2015-06-15 2016-12-21 Palantir Technologies, Inc. Data aggregation and analysis system
CN105468571A (en) * 2015-11-19 2016-04-06 中国地质大学(武汉) Method and device used for automatically generating report
CN105912633A (en) * 2016-04-11 2016-08-31 上海大学 Sparse sample-oriented focus type Web information extraction system and method
CN106484767A (en) * 2016-09-08 2017-03-08 中国科学院信息工程研究所 A kind of event extraction method across media
CN106649455A (en) * 2016-09-24 2017-05-10 孙燕群 Big data development standardized systematic classification and command set system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
R BETIK等: "Automatic Generation of Sythetic XML Document", 《DIPLOVMOVA PRACE UNIVERZITA KARLOVA 》 *
李磊: "基于Hadoop的RSS内容抓取与排版系统的开发", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977547A (en) * 2019-03-27 2019-07-05 北京金和网络股份有限公司 Big data bulletin generation method based on dynamic modeling
CN110610068A (en) * 2019-09-16 2019-12-24 郑州昂视信息科技有限公司 Method and device for application isomerization
CN110610068B (en) * 2019-09-16 2021-11-23 郑州昂视信息科技有限公司 Method and device for application isomerization
CN111143403A (en) * 2019-12-10 2020-05-12 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111143403B (en) * 2019-12-10 2021-05-14 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111539200A (en) * 2020-04-22 2020-08-14 北京字节跳动网络技术有限公司 Method, device, medium and electronic equipment for generating rich text
CN111539200B (en) * 2020-04-22 2023-08-18 北京字节跳动网络技术有限公司 Method, device, medium and electronic equipment for generating rich text
CN116450747A (en) * 2023-06-16 2023-07-18 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data
CN116450747B (en) * 2023-06-16 2023-08-29 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data

Also Published As

Publication number Publication date
CN108959626B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN108959626A (en) A kind of cross-platform efficient automatic generation method of isomeric data bulletin
US6785685B2 (en) Approach for transforming XML document to and from data objects in an object oriented framework for content management applications
US6611844B1 (en) Method and system for java program storing database object entries in an intermediate form between textual form and an object-oriented form
US6606632B1 (en) Transforming transient contents of object-oriented database into persistent textual form according to grammar that includes keywords and syntax
US5295256A (en) Automatic storage of persistent objects in a relational schema
US6704747B1 (en) Method and system for providing internet-based database interoperability using a frame model for universal database
US6609130B1 (en) Method for serializing, compiling persistent textual form of an object-oriented database into intermediate object-oriented form using plug-in module translating entries according to grammar
US6298354B1 (en) Mechanism and process to transform a grammar-derived intermediate form to an object-oriented configuration database
US6598052B1 (en) Method and system for transforming a textual form of object-oriented database entries into an intermediate form configurable to populate an object-oriented database for sending to java program
US9009195B2 (en) Software framework that facilitates design and implementation of database applications
US6542899B1 (en) Method and system for expressing information from an object-oriented database in a grammatical form
US20040044687A1 (en) Apparatus and method using pre-described patterns and reflection to generate a database schema
US20060271885A1 (en) Automatic database entry and data format modification
US20040044989A1 (en) Apparatus and method using pre-described patterns and reflection to generate source code
US8707260B2 (en) Resolving interdependencies between heterogeneous artifacts in a software system
Bertino et al. Modeling multilevel entities using single level objects
Opmanis et al. Multilevel data repository for ontological and meta-modeling
Mueck et al. Index data structures in object-oriented databases
Völkel D2. 3.3. v2 SemVersion Versioning RDF and Ontologies
Rose et al. Schema versioning in a temporal object-oriented data model
Schilling et al. Standard-oriented ontology export of domain catalogues from data dictionaries
Baker Design and implementation of database computations in Java
Batory et al. Introductory P2 System Manual
Barskiy Code-First Development with Entity Framework
Toman Storing XML Data In a Native Repository.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant