CN108959626A - A kind of cross-platform efficient automatic generation method of isomeric data bulletin - Google Patents
A kind of cross-platform efficient automatic generation method of isomeric data bulletin Download PDFInfo
- Publication number
- CN108959626A CN108959626A CN201810811216.4A CN201810811216A CN108959626A CN 108959626 A CN108959626 A CN 108959626A CN 201810811216 A CN201810811216 A CN 201810811216A CN 108959626 A CN108959626 A CN 108959626A
- Authority
- CN
- China
- Prior art keywords
- class
- token
- data
- bulletin
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the cross-platform efficient automatic generation methods of isomeric data bulletin, comprise the steps of, 1, the processing of magnanimity isomeric data, manage data using SX404DB key assignments type centralized database;SX404DB key assignments type database is the key assignments type NoSQL database based on Inverted Index Technique;2, bulletin content automatically generates, by DocumentScript Script controlling system come dynamic generation bulletin content;3, the Automatic Typesetting of presentation format is completed using to the format module injection content based on Office OpenXML and by the way of compressing it to DOCX format file.Previous bulletin generation method is compared, this method, which has, supports magnanimity isomeric data, and content generating mode is flexibly expansible, the feature that presentation format is stable and compatibility is high, and with good stability, operational and scalability.
Description
Technical field
Present invention relates particularly to And Methods of Computer Date Processing, specially a kind of cross-platform isomeric data bulletin is efficiently automatic
Generation method.
Background technique
Tobacco business is that a market structure and the huge industry of institutional framework are related to tobacco city in tobacco is managed
The links such as field supervision, tobacco leaf production, tobacco marketing have large number of tobacco marketing site, therefore, for cigarette in the market
The management of careless industry is always the huge work of workload.
In tobacco management, calculating data management is early had been achieved with, still, in management work, not only to be involved how to
Each link process of specification also relates to how to provide for management unit and a reacts current tobacco business brief data strictly according to the facts
Bulletin, for management decision-maker provide management reference.
Office automation always is a key areas of computer application, and electronic manuscript technology is that this field is very heavy
The branch wanted.The electronic manuscript of narrow sense refers to reading on an electronic device, edits or the number of publication and printing
Contribution, and the electronic manuscript of broad sense can refer to all multimedia digital documents.Electronics bulletin be one kind be dedicated as report say
Bright special electronic manuscript is all widely applied in every field such as telecommunications, traffic, finance, education, geology.It is this
Electronic manuscript is a kind of functional manuscript, and the electronic manuscript of similar function content and in form have greatly it is similar
Place, or even sometimes except data, other content is almost.And bulletin is write and has often been spent largely
Manual labor's time, the research work of bulletin automatic generation method are significantly.
Bulletin automatically generates work and can be divided into three levels: data organization, bulletin content generate, format setting.
In traditional application system, data source often choice relation type database, and the height consistent affairs of relevant database
Property and property easy to use are also unquestionable.But modern electronics bulletin is united with small-scale data mostly
What meter was write again, but based on a large amount of multi-source heterogeneous data.Obviously traditional relevant database is automatic as modern bulletin
The data source of generation system is unsuitable.Existing major part bulletin, which generates system for content generating mode, may be roughly divided into
Two kinds.One is the contents for automatically generating bulletin from the beginning to the end using programming language;Another kind is the side replaced by placeholder
Formula generates bulletin content, is briefly exactly to utilize template generation content.Because most of popular document format belongs to private
There is format, so it is all to realize the maker-up of electronic document by way of calling third party API that previous bulletin, which generates system,
Make, but often because system is not exclusively compatible or the not congruent reason of help document causes typesetting effect unsatisfactory.
Summary of the invention
To solve the above-mentioned problems, the present invention provides one kind, set forth herein one kind using NoSQL database as data source, leads to
It crosses dynamic script and generates bulletin content, with the cross-platform isomeric data letter of the customization presentation format of Office OpenXML standard
It offers a high price and imitates automatic generation method.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin of the invention, comprises the steps of,
(1), the processing of magnanimity isomeric data manages data using SX404DB key assignments type centralized database.It is described
SX404DB key assignments type database is the key assignments type NoSQL database based on Inverted Index Technique.
(2), bulletin content automatically generates, by DocumentScript Script controlling system come in dynamic generation bulletin
Hold.
(3), the Automatic Typesetting of presentation format injects content and general using to the format module based on Office OpenXML
Its mode for being compressed to DOCX format file is completed.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Using SX404DB key assignments type centralized database manage data, specifically, creation one database Session session object to
SX404DB database initiates a database session, specifically, creating an entity object first, entity pair is then arranged
The coding of elephant, type, area, time several attributes.Then the query of session object is called using the combination condition as parameter
Method inquires all entity objects for meeting corresponding conditions.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Data are managed using SX404DB key assignments type centralized database, specifically, by the following procedure packet that contains in SX404DB into
Row: using program bag convertor, realizes the conversion of format between data object.Using program bag Directory, rope is realized
Draw directory management function.Using program bag Index, inquiry and modification to index are realized.Using program bag Properties,
Realize the configuration of database.
Using program bag Session, the management of accessing data base session.Using program bag Condition, data
Condition function in operation.Using program bag Sort, the ranking function in data query is realized.
Wherein, user by include in Session class the save method of heavy duty, delete method, update method and
Resource in query way access database, Session class and Searcher class, Processer class,
DocumentConvertor class belongs to dependence,
Session class passes through Searcher class to data query, by Processer class to data modification.Pass through
DocumentConvertor class realizes the conversion of data object format.
ConcurrentDirectory class and Searcher class, Processer class are paradigmatic relations,
The object of ConcurrentDirectory class respectively appears in Searcher class and Processer class as an attribute.
Processer class provides the following method for calling: logic deletion is carried out to data with delete method, when
Delete method is called, and the data operated enter recovery area, pass through clearTrash method clearing and retrieving area.
ForceDelete method is a physics delet method, and data will be unable to restore by physics deletion.With insert method logarithm
According to being added.It is modified with update method to data.
Entire SX404DB database is that each file path only provides a ConcurrentDirectory example, is led to
Operation ConcurrentDirectory example is crossed to realize storage, inquiry, modification and delete, ConcurrentDirectory's
Thread lock is read and write abruption, and all write operations are synchronized by the way of being lined up and executed.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Data are managed using SX404DB key assignments type centralized database, specifically, all complete data objects are all with Document
Class is unit storage, and each Document object includes several Filed members.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
By DocumentScript Script controlling system come dynamic generation bulletin content, specifically include:
(1), source code is cut into several words by morphological analysis.
(2), the hierarchical logic relationship between each word is cleared in syntactic analysis, generate several abstract syntax tree.
(3), machine language is generated, machine language is generated using interpreted languages processor, by interpreter by abstract language
Method tree explains and executes one by one, and feeds back the result of execution.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Morphological analysis is to realize that a Lexer class segmenter splits source code character string to become Token, passes through regular expression
Match to complete participle function, specifically,
The Token is divided into character string word class, numerical value word class, identifier word class, end of file word class,
End of file class Token is realized using in singleton pattern insertion Token class as a static member.
In Lexer class, it is equipped with comPat, numPat, strPat, idPat character string type field, is matching note respectively
It releases, the regular expression of numerical value Token, character string Token, identifier Token, regexPat word is additionally provided in Lexer class
Serial type field is accorded with, for legitimate characters string all in expression formula matching DocumentScript.In the grammer point of Lexer class
Analysis process execute when, segmenter will read source code line by line, check one by one from left to right each row content whether with regexPat
Match, and by all matched text string extractings.
Lexer object obtains source code by receiving Reader object.
During syntactic analysis after participle, Lexer class provides a peek method.Construct the mistake of abstract syntax tree
Journey is the trace-back process of depth-first, and in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures, specifically
To provide the buffering queue of a peek method and an interim Token of storage.When constructing abstract syntax tree, peek is first passed through
Method know below will read Token and be deposited into buffering queue, then the content in buffer area is sentenced
It is disconnected, Token is finally obtained by read method again and constructs abstract syntax tree.
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, when for
It is empty then a Token is added into buffering queue, then the Token of buffering queue head is returned, and it is deleted from buffering queue
It removes.Several Token of pre-read can be stored in buffering queue by the peek method when executing every time, and as letter
Number return value returns, and does not delete any element in buffering queue.
Several abstract syntax tree of the generation, specifically, Token sequence is conformed to the principle of simplicity single line according to the syntax rule of language
Property structure be assembled into tree structure, abstract syntax tree is defined as the interface of entitled ASTree, and is equipped with several realization ASTree
The node class of interface.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Abstract syntax tree is explained and is executed one by one by interpreter, and feed back execution as a result, specifically, by interpreter to every
One abstract syntax tree evaluation, the method for evaluation traversed in a recursive manner since root node entire abstract syntax tree up to
Leaf node, each access node can have an evaluation return value, and in addition to leaf node, the return value of other nodes
All rely on the return value of its child node.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Segmenter will read source code line by line, specifically, ought not include the blank character of front matched character string with comPat
Match, then the character string is one section of annotation, when the character string is numerical value type-word face amount, character string literal or identifier, then
It is matched with numPat, strPat or idPat.Being determined the Token of type, to be stored into Token queue to be returned, continues to use phase
Same method handles remaining part, constantly repeats, until source code terminates, source code can be split into one by Lexer
Token queue.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, if described
The dry node class for realizing ASTree interface can be divided into leaf node and non-leaf nodes.Leaf node no longer includes child node,
Non-leaf nodes may include child node.Leaf node include four classes: Name class, NumberLiteral class,
StringLiteral class and NullStmnt class are all inherited in leaf node class, and non-leaf nodes is all inherited in non-leaf nodes
Class, non-leaf nodes can be divided into three classes: the first kind is used as the Row control of program, and this kind includes sequential organization class, uses
Make the branched structure class of the processing of expression formula and the control loop structure class as function.
A kind of cross-platform efficient automatic generation method of isomeric data bulletin as described above, further illustrates and is, described
Lexer object obtains source code by receiving Reader object, has specifically included two methods of read and peek, described
Read method obtains Token since the head of source code one by one, can return to a new Token when called every time.It is described
Peek method will return to i-th after the Token that read method will return for pre-read Token, peek (i)
Token.Source code reading finishes, and read method and peek method will all return to Token.EOF.
Beneficial effects of the present invention:
Previous bulletin generation method is compared, this method, which has, supports magnanimity isomeric data, content generating mode flexible
Feature expansible, that presentation format is stable and compatibility is high.The system is with good stability, operational and scalability.
NoSQL database SX404DB based on inverted index of the invention is a kind of key assignments type NoSQL database, it with
Management data are described using Star Model, with the help of it, developer need not be cumbersome again based on Lucene index
Build table work and database compatibling problem and work with one's mind laborious.Because at all without table in SX404DB, all data are adopted
With spider management, both without hierarchical relationship or without the constraint relationship between data.The database is to structuring, semi-structured
It may be implemented easily to manage with unstructured data, or even support being directly accessed for Java object.
The present invention realizes a kind of content injection method of the DOCX document of fixed format.
During bulletin is write, format setting is one and spends the time and undergo more work, and this part work
Work can transfer to computer to automatically process completely.Reticle of the DOCX format as Microsoft Office office documents
Formula has high occupation rate of market in office documents field, and the document content method for implanting realized herein is aiming at this lattice
What formula was realized.In the treatment process of this method, it need to only be injected according to Office OpenXML standard into template in corresponding
Hold, can easily realize the unification of document format.It is a large amount of that the use of this method can help the writer of bulletin to save
The format setting time.
The present invention devises a kind of scripting language DocumentScript automatically generated for document content, and in Java
The exploitation of the processing platform of the language is completed under environment.
The content of bulletin automatically generates often by the way of tag replacement, and herein using the side of Script controlling
Formula.Can be more flexible with the document content that the method generates, not only support the writing of sequential file, but also may be implemented
The condition of content generates or circulation generates.And DocumentScript is a kind of scripting language, i.e., it should not be for specific flat
Platform compiles the interpreted languages that can be executed, and content generation is controlled by it can have better professional platform independence and can expand
Malleability.
Specific embodiment
SX404DB of the invention is the key assignments type NoSQL number based on Inverted Index Technique for applying Java language exploitation
According to library.It realizes ORM mechanism based on the inverted index that Lucene is established, through dynamic proxy technology, it is final realize with
Java object is the data storage and access function of unit.
All database manipulations are all based on a database session in SX404DB, so before accessing database first
Create a database session object, i.e. Session object.By a Session object, system can be with very comfortable
Access and the entire SX404DB database of management
The example of data deposit: creating a tobacco marketing first, total several classes of (such can be any meet
JavaBean specification user-defined class) entity object, it is then that id, type, province and time of the object is several
The value of a attribute is set as " 1 ", " Chengdu tobacco ", " Chengdu " and " 2016-03-01 ", finally calls the save of session object
Method completes the deposit of the object.
The example of data query: MultiCondition (combination condition) object is created first, then creates two
A simple condition (province value is " Chengdu ", and time value is " 2016-03-01 "), and by the two simple conditions with necessary
Condition form (ConditionOccur.MUST indicate the condition be in query composition be must condition) be added to before
The combination condition of creation is finally inquired using the combination condition as the query method of parameter calling session object all full
The tobacco marketing sum object of sufficient corresponding conditions.
The example of data modification: the simple condition that an id value is " 1 " is created first, then creates a province
Value is the modification item in " Hunan ", finally mutually calls session object with modification with the querying condition created before for parameter
The province value for the tobacco marketing sum object that id values all in database are " 1 " is all changed to " Hunan " by update method.
The example that data are deleted: creating the simple condition that an id value is " 1 " first, is then ginseng with the simple condition
Number calls the delete method of session object to delete the tobacco marketing sum object that id values all in database are " 1 ".
Contain 7 program bags in SX404DB, each program bag has undertaken different function task, has been described as follows table institute
Show.Wherein convertor packet, directory packet, index packet and properties packet are 4 packets towards bottom, are born respectively
Blame format conversion function, index list management function, the query modification function of index and the base of database between data object
This configuration feature.Session packet, condition packet and sort packet are the packets of 3 application-oriented layers, they are each responsible for data
The ranking function in condition function and data query in the access function in library, data manipulation.These three packet in class be mainly
Third-party application accesses database and provides.
Program bag | Explanation |
cn.edu.cug.sx404.database.condition | It realizes to inquiry, modification and the management for deleting condition |
cn.edu.cug.sx404.database.convertor | Realize the conversion of format between data object |
cn.edu.cug.sx404.database.directory | Realize the management of index list |
cn.edu.cug.sx404.database.index | Realize the inquiry and modification to index |
cn.edu.cug.sx404.database.properties | Realize the configuration of database |
cn.edu.cug.sx404.database.session | The management of accessing data base session |
cn.edu.cug.sx404.database.sort | Realize the ranking function in data query |
The relationship model of SX404DB major function class: there are save method, the side delete of several heavy duties in Session class
Method, update method and query method, user can access the resource in database by these methods.Session
Class and Searcher class, Processer class, DocumentConvertor class belong to a kind of dependence, that is to say, that
Session class be respectively to the inquiries of data and modification operation realized by Searcher class and Processer class, and
Data object format conversion function is realized by DocumentConvertor class.ConcurrentDirectory class
It is a kind of paradigmatic relation with Searcher class, Processer class, in other words, the object of ConcurrentDirectory class
It is respectively appeared in Searcher class and Processer class as an attribute.
The increasing of data, to delete, change, looking into be the most basic function of database, these basic operations of the bottom data of SX404DB
Function is that the Inverted Index Technique based on Lucene frame is developed, in this inverted index frame, all complete data
Object is stored as unit of Document class.It (is simply exactly to accord with that the basic storage cell of database, which is JavaBean,
Close the Java entity class of Object-Oriented Design principle), but what is really stored in database index is not JavaBean object,
But it is able to reflect the Document object of JavaBean characteristics of objects.Each Document object includes several
Filed member, Document object also exactly reflect the feature of JavaBean object by these Field members.
The index that Lucene is created from logical construction is made of Segment, Document, Field, Term.It is a
Lucene index is made of a series of files, these files have different filename prefix or suffix, their specific function
The function description of subfile in table index file can be described as follows:
In SX404DB, access bottom Document be realized by Searcher class and Processer class, wherein
Searcher class provides the query function of data, and Processer class provides the addition, modification and deletion function of data.
Searcher class provides 3 methods for calling, and the summary of these methods is as shown in table 3.1.3.
GetInstance method is a static method, the example for creating Searcher class.Search method is for data
Inquiry, this method are performed, and will return to a Document sequence according to specified querying condition and ordering rule.
The function of getDocumentByDocID method is that Document object is inquired according to ID.It is as follows:
Processer class provides 8 methods for calling, and the summary of these methods is as shown in the table.
GetInstance method is one for creating the static method of Processer class example.Delete method is for data
Logic is deleted, and when this method is called, the data operated will enter recovery area, and the cleaning of recovery area then passes through
ClearTrash method is realized.ForceDelete method is a physics delet method, once data will by physics deletion
It can not restore.Insert method and update method are respectively used to the addition and modification of data.Processer class method summary
Table:
Searcher and Processer is actually also to have invoked three primary ropes of Lucene to the access of bottom data
Draw access class: IndexWriter, IndexReader and IndexSearcher.It can be by this according to the property to index operation
Three classes are divided into read operation class and write operation class.IndexReader and IndexSearcher belongs to the read operation class of index,
IndexReader can extract corresponding Document according to unique ID from index, and IndexSearcher can root
The ID set of corresponding Document is inquired according to specified querying condition.IndexWriter belongs to the write operation class of index, can
To provide the function of write-in index for caller, when user needs to increase data, deletes data or modification data, it is ok
It is utilized to reorganize and indexes and write index such as disk or memory.
The encapsulation and conversion of data object
The storage of SX404DB bottom is largely isolated key-value pair, and Lucene frame also only merely provides
The access facility of Document object, so to realize being directly accessed function and just must first realizing for JavaBean object
The conversion function between JavaBean object of Document object.
The conversion function of SX404DB data object is mainly realized by DocumentConvertor class, and the realization of this class
What is used is the design philosophy of ORM.Here ORM is the abbreviation of Object Relational Mapping, represents object pass
System's mapping.DocumentConvertor provide convert2Document, convert2Obj, convert2Query,
Several method for converting types such as castToQuery, castToField and cast.Cast completes basic data type (such as
Int, long, double, float etc.) and String type between conversion work.CastToQuery and castToField
Complete the conversion work by key-value pair Query and Field into Lucene.Convert2Documen and convert2Obj are complete
At the conversion work between JavaBean and Document.Convert2Query completes the conversion from JavaBean to Query
Work.
In order to improve the efficiency of entire database, SX404DB is using a kind of Lazy data loading mode, that is to say, that
Data in entity object are that instant access loads immediately.This data loading mode may also be referred to as dynamic data load
Mode, what its realization was realized often by dynamic proxy technology.In order to facilitate realization proxy mode Java language itself
Provide a set of dynamic proxy mechanism.But the realization of this dynamic proxy mechanism must assure that proxy class and realize that class is all real
Existing same interface.This dynamic proxy implementation can design to application layer and bring many extra works, and interface is once
It designs, the later period is difficult to extend again.It is a kind of more flexible dynamic to be that SX404DB is realized herein by third party's class libraries CGLIB
State data loading mode.In this mode, all entity objects obtained by data base querying are added by database
Work, the attribute in object is all null value before accessed, only the ability dynamic call data load mechanism in access, and
And these objects do not need to realize any predefined interface.CGLIB is that a high performance bytecode generates class
Library, so by the class that it can go modification to define from bytecode level, to realize unrelated with any interface move
State agency mechanism.
The processing of thread-safe problem:
Just the thread-safe problem under distributed environment is considered at the beginning of the Frame Design of Lucene, so in Lucene
Inherently there is sound lock mechanisms in portion.But the lock in Lucene is a kind of fine-grained lock, can only guarantee reading and writing data
The normal operation of operation.In SX404DB, the basic logic unit of storage is not key-value pair, but JavaBean object,
Storage for this big granularity object, basic Lucene lock mechanism, which not can guarantee not, there is dirty reading phenomenon.
In order to avoid the conflict between thread, Java provides two kinds of thread synchronization mechanisms.The first is to use
Synchronized key will need synchronous code block to wrap up, and second is to realize thread synchronization by thread lock.
The thread lock Mechanism Design that SX404DB is provided using Java language itself the directory management mechanism of a set of thread-safe.Entirely
Database is that each file path only provides a ConcurrentDirectory example, all storages, inquiry, modification
It with function is deleted is realized by operating ConcurrentDirectory example.In order to improve the performance of database,
The thread lock of ConcurrentDirectory using read and write abruption mode.In the thread latching mode of this read and write abruption
Under, all read operations can asynchronous execution, but all write operations synchronous by the way of being lined up must execute.In this way
One reads data manipulation not only ensure that the thread-safe under multi-thread environment also avoids unnecessary queuing, thus significantly
Improve the performance of database.
Can easily find to spend during many bulletins are write time most work be data statistics and
Format setting, in order to simplify this work, this paper presents a kind of method to DOCX format file insertion content, this sides
Method mainly includes three steps.
1, OOXML template is formulated.
2, information is extracted from the Java entity class for carrying information, be injected into template.
3, the template comprising information is compressed into the document for browsing and issuing.
Template in the first step is write by OOXML specification.One simplest OOXML document includes three portions
Point: relationship map part, content type definition part and body matter part.Most important one body matter part is to use
WordprocessingML language is recorded in "/document.xml " file.WordprocessingML language is to follow XML
Linguistic norm, be a kind of markup language.All labels defined in the language are started with w (such as < w:document
>), most of label is that occur in pairs of form, and a distinguishing label is that occur with single label form.Entire document content is all
It is filled out<w:document></w:document>in, wherein including a pair<w:body></w:body>label,<w:
Body></w:body>again comprising several right<w:p></w:p>label.It is each pair of<w:p></w:p>all indicate a paragraph, it is intermediate
It is several right to may include<w:r></w:r>label can also inside add one to describe paragraph style<w:pPr/>
Label.It is each pair of<w:r></w:r>indicate a string of continuous characters, the inside may include one to describe character string pattern<
W:rPr/>label and a pair store character string<w:t></w:t>label.
Work after the completion of stencil design is that document content is injected into template, will be used for template content injection herein
Class is defined as TemplateUtil.Template content function of injecting is by calling the insert method of TemplateUtil come real
Existing, the realization process of the function has three steps.The first step reads in all information in template in memory in the form of character string.
Second step searches label from the character string for be mounted with template.The process of this label lookup can pass through regular expression
With completing, expression formula " $ { [^ } ^ | | ^ $]+ " characters of all " $ { ... } " forms can be matched
String.Third step will be mounted with tag replacement all in the character string of template into label mapped document content, and will be complete
Template file is re-write at the character string of replacement.
Last procedure of entire DOCX format file content insertion work is exactly the packing of OOXML file.In OOXML
In format standard, the document of DOCX format be all according to OPC pact with Unicode coding and ZIP format compression made of.This
Text will be in the realization write-in ZipCompressor class of ZIP compression function.For logically, a DOCX document is exactly one
OPC packet, this packet are the set of each section of complete set again.Each part by a case-insensitive pathname,
This pathname is the character string for dividing section name with left slash "/" shaped like "/pres/slides/slide1.xml ";And it is every
There is its specific content type in a part.For physical structure, encapsulated by OPC pact ZIP file namely one
OPC packet, a part in the corresponding packet of each ZIP file item, and the pathname also path famous prime minister one with the part in packet
It causes.In this OPC packet, "/[Content_Types] .xml " is used to define the content type of various pieces.It is every in packet
There is also clear mapping relations between a part.This series of mapping relations is all stored in the portion relationships
Point.All mapping relations parts are all with the form name of " .../_ rels/ ... .rels ";As a part path it is entitled "/
A/b/c.xml ", then its mapping relations pathname is then "/a/b/_rels/c.xml.rels ".It is main in entire packet
Document content be to be recorded in the part document, this part with "/document.xml " this file record text
The main contents of shelves.
The a set of scripting language DocumentScript's designed and developed herein for document content automatic generation function sets
It counts principle and realizes process.
The treatment process of language processor
The treatment process of language processor undergoes three basic steps: morphological analysis, syntactic analysis, execution (or generate machine
Device language).Morphological analysis can be referred to as again to be segmented, i.e., source code is cut into several words (Token);It next is exactly language
Method analysis, so-called syntactic analysis refer to the hierarchical logic relationship cleared between each word, generate several abstract syntax tree
(AST, Abstact Syntax Tree).(or generating machine language) is executed to explain abstract syntax tree one by one by interpreter
And execute, finally feed back the result of execution.
Segmenter design
Realize that a language processor first step is to realize a segmenter (Lexer).One does not add the program of processing
Source code can be regarded as a long character string, this character string is made of a succession of short character strings.The effect of segmenter is just
It is that this character string grown splits into Token one by one by source code.
The Token of DocumentScript language is segmented into four classes: character string word, numerical value word, identifier word
With end of file word.Character string word and numerical value word are it is well understood that be exactly the character sequence for representing character string and numerical value
Column.But both ends be added to quotation marks (") shaped like " 123 " character string, even if quotation marks middle section can indicate a number
Word, nor a numerical value word, but character string word.Identifier word is exactly some keywords used in program, big
Bracket " { } ", bracket " [] ", round bracket " () ", branch ";" and variable name etc..In addition to three of the above has practical significance
Token, also define special end of file (EOF) word, in DocumentScript herein for identifying
The end of code file.
Word herein is defined as an abstract class Token, its field and method design is as shown in the table respectively.Its
Middle EOF indicates the Token of end of file, and EOL is then line feed character defined in DocumentScript.isNumber,
Tri- methods of isIdentifier, isString be for judging whether it is numerical value, character string or the word of type of identifier,
The concrete type of Token can be determined by these three methods.The function of getText method and getNumber method is then to use
To return to the numerical value and character string in Token object.In addition to getLineNumber method in Token class, other methods are all
Abstract method, these abstract methods are realized one by one in its subclass.
Token class field summary table:
Token class method summary table:
Based on parent Token, three subclasses: StrToken, NumToken, IdToken are defined herein.Their generations respectively
Table character string word, numerical value word and identifier word.End of file (EOF) Token, because function is only authentication code
The end of file, and structure is simple, so there is no the subclasses as Token to realize on external file, but with singleton mould
It is realized in formula insertion Token class as a static member.
One section of complete program code can split into above-mentioned four kinds of Token sequences, and this splits work and then transfers to
Segmenter (lexical analyzer) is completed.Segmenter is defined as Lexer class herein, it is to pass through to the realization of participle function
Regular expression matching is completed.
In Lexer class, there are five character string type fields.ComPat, numPat, strPat, idPat be respectively for
Regular expression with annotation, numerical value Token, character string Token, identifier Token, and regexPat expression formula then can be with
Match legitimate characters string all in DocumentScript.When the parsing process of Lexer class executes, segmenter will
Source code is read line by line, checks whether each row content matches with regexPat one by one from left to right, and by all matched characters
String extracts.If matched character string (not including the blank character of front) matches with comPat, illustrate that the character string is one section
Annotation.It, can be with numPat, strPat if the character string is numerical value type-word face amount, character string literal or identifier
Or idPat matching.Being determined the Token of type, to be stored into Token queue to be returned.It is further continued for later using identical method
Remaining part is handled, constantly repeats down, until source code terminates, source code can be split into a Token by Lexer
Queue.In the Lexer class of this paper, above-mentioned such process is then mainly by readLine method and addToken method
It realizes.
Lexer class field summary table
Lexer class method summary is as shown in the table.Lexer class has the building method an of type parameter containing Reader,
Lexer object obtains source code by receiving Reader object.Two sides read and peek are also defined in Lexer class
Method, the process of morphological analysis are driven by the two methods.Read method can obtain one by one since the head of source code
Token is taken, a new Token can be returned when called every time.Peek method is then for pre-read Token, peek (i)
I-th of Token after the Token that return read method will be returned.If source code reading finish, read method and
Peek method will all return to Token.EOF.
Lexer class method summary table:
If only simple participle, Lexer class provide a read method and can fully achieve.But it is segmenting
Later during syntactic analysis, Lexer class just needs in addition to provide a peek method.Syntactic analysis is an one side
Obtain the process that Token constructs abstract syntax tree on one side.And the process of this construction syntax tree is the backtracking of a depth-first
Process, in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures.In order to support returning in this process
It traces back, method are as follows:
The buffering queue of one peek method and an interim Token of storage is provided.When constructing abstract syntax tree, first pass through
Peek method know below will read Token and be deposited into buffering queue, then in buffer area content carry out
Judgement finally obtains Token by read method again and constructs abstract syntax tree.
Construct the abstract syntax tree method performance table of comparisons:
Method | Time complexity | Space complexity |
Method one | O(n2) | O(1) |
Method two | O(n) | O(n) |
Method three | O(n) | O (1)~O (n) |
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, if
A Token is then added into buffering queue to be empty, finally again returns to the Token of buffering queue head, and by it from buffering team
It is deleted in column.But several Token of pre-read can be stored in buffering queue when executing every time by peek method, and by its
It is returned as function return value, without will be deleted any element in buffering queue.And read method and peak method are to buffering
The operation of queue addition Token relies on fillQueue method all to realize.FillQueue method has an int type
Parameter and a boolean type return value, its parameter indicates to read in the number of Token, and return value then indicates buffering team
Whether column filling succeeds.The return value of ordinary circumstance this method is all true, and only program code all reads to finish and can just return
Return false.
The design of syntax analyzer
Work after the completion of morphological analysis is building abstract syntax tree, and the process of syntactic analysis is namely by Token sequence
Column are assembled into tree structure from simple linear structure according to the syntax rule of language, are in brief exactly to construct abstract syntax tree.It takes out
As syntax tree is defined as the interface of an entitled ASTree, method summary is as shown in the table.
ASTree interface method summary table:
ASTree is the interface definition to abstract syntax tree rather than specific class, so in order to completely describe one
Abstract syntax tree, also designs large quantities of node classes for realizing ASTree interface, their function declaration such as following table institute herein
Show.
Abstract syntax tree node type explanation:
The realization node class of abstract syntax tree can be divided into leaf node and non-leaf nodes, they are defined respectively herein
For ASTLeaf class and ASTList class.As its name suggests, leaf node no longer includes child node, and non-leaf nodes may include son
Node.Leaf node includes four classes: Name class, NumberLiteral class, StringLiteral class and NullStmnt class.
They are all inherited in ASTLeaf class.Non-leaf nodes is all inherited in ASTList class.These non-leaf nodes can be divided into three
Class: the first kind is used as the Row control of program, and this kind includes BlockStmnt class, IfStmnt class and WhileStmnt class,
They have respectively represented sequential organization, branched structure and loop structure;Second class is used as the processing of expression formula;Third class is used as
The control of function.
The design of interpreter:
Work after the completion of syntactic analysis is that the program of interpreter executes work.And after the completion of abstract syntax tree building,
With regard to fairly simple, interpreter only need to be to each abstract syntax tree evaluation for the execution of program.And the method for this evaluation
It is to traverse entire abstract syntax tree in a recursive manner since root node up to leaf node.Each access node
An evaluation return value is had, and in addition to leaf node, the return value of other nodes all relies on the return value of its child node.
If will be according to abstract syntax tree come evaluation, class corresponding to each node object of abstract syntax tree must have
A standby evaluation technique.This evaluation technique eval is defined in ASTree herein, form of Definition such as public
Abstract Object eval (Environment env), all successions must all realize this method in the subclass of ASTree.
Therefore, as long as calling the eval method of the root node object of abstract syntax tree, the corresponding journey of the syntax tree can completely be executed
Sequence.
The DocumentScript language of this paper is a kind of scripting language for supporting variable-definition, so can be related to variable
Scope, therefore environmental objects can be passed to eval method when being executed.In short, environmental objects are exactly that one kind is used for
The data structure of the corresponding relationship of record variable title and variate-value, is defined as Evironment interface herein.Work as program
When adding new variables, the key-value pair being made of the title of the variable and initial value will be added in current environment object, later
If reuse the variable, program will be taken out variate-value from the environmental objects.If to be assigned again to same target
New value is given, then needs the domain for first finding the variable, variate-value is updated to the environmental objects of specified domain.
The realization of Evironment interface is completed by BasicEnv class.In BasicEnv class, values object
It is a HashMap, for completing the storage work of key-value pair;Outer object be current environment father's environment, subenvironment with
His father's environment is a kind of inheritance, i.e., variable defined in the accessible father's environment of subenvironment, father's environment cannot access son
Variable defined in environment.The mentality of designing of variable assignments and value function is very simple, it be by put and get method come
It realizes.It should be noted that put method and putNew method are distinguishing.PutNew method is in current environment object
In directly add or modify variable.And put method is then the domain first judged to performance variable, if the variable is at it
It is defined in father's environment, then modifies the variate-value in his father's environment, if the variable is defined in current environment or not in office
It was defined in what environment, then calls the addition of putNew method or modification variable in current environment.Wherein search variable-definition
The function in domain transfers to where method to complete, and whether its implementation procedure is first to judge the variable in current environment quilt
Definition, if be not defined in current environment, continuing the where method in recursive call father's environment to judge the variable is
It is no to be defined in his father's environment, if the variable is not defined in current environment and current environment does not have father's environment, return
null。
It is an abstract class that ASTree is also mentioned in front, and internal eval method is simultaneously not implemented, so specific evaluation
Method is realized in subclass.
In numerous subclasses of ASTree, Name class, NumberLiteral class, StringLiteral class and
NullStmnt class belongs to leaf node class, the i.e. subclass of ASTLeaf.Because Name indicates the customized variable of user,
Its eval method implementation procedure is the value from environmental objects, is dished out if the definition that the variable is not present in environmental objects
One exception.NumberLiteral class and StringLiteral class belong to the node class of literal type, so eval method
The basic value in its Token is only returned, therefore it realizes that process is relatively simple, so repeating no more.NullStmnt
What class represented is null statement, without return value, so there is no the specific implementations of eval, but directly inherits ASTLeaf's
Eval method, the throw exception if its eval method is called.
All streams in order to realize the sequence of program, branch and the Row control function of circulation, in non-leaf nodes class
Process control class also all realizes eval method.
In the eval method implementation procedure of IfStmnt class, the Rule of judgment (condition) of existing object is first carried out
Eval method, return value executes the eval method of code block (thenBlock) certainly if it is true, if it is returned
Value is false and current IfStmnt object contains negative code block (elseBlock), then executes the side eval of negative code block
Method.Here Rule of judgment, certainly code block and negative code block is by condition method, the side thenBlock respectively
What method and elseBlock method obtained, and the return value of these methods is by calling child (0), child respectively
(1) and child (2) obtain.
The eval method implementation procedure of WhileStmnt class is a circulation, and circulation is first to its cycling condition every time
(condition) evaluation is carried out, the eval method of loop body (body) is executed if return value is true, otherwise jumps out and follows
Ring.Its cycling condition and loop body are obtained by calling condition method and body method respectively.
BlockStmnt class only represents a code block, so the execution that its evaluation process is also sequence wherein includes
All nodes eval method, without other complex logics, therefore the specific implementation of its eval method repeats no more herein.
It is well known that expression formula necessarily has its return value, so all expression processing classes in non-leaf nodes class are all
Realize its eval method.The range of definition of expression formula itself can be common literal, be also possible to transport than broad
Operator expression formula.The evaluation technique of numeric type literal and character string type literal was introduced in front, herein will no longer
It repeats, the eval method that operator expression is explained below is realized.
Operation expression can be divided into unary operation expression formula and binocular operation expression.In unary operation expression formula,
DocumentScript only realizes the evaluation technique of two kinds of expression formulas of negative Expression formula and negative value expression formula, their realization side
Method is relatively simple, only need to script numerical value be negated or be taken negative operation to return again to, so its realization repeats no more.But it is worth
It is noted that the type for needing to treat operand before executing unary operation needs to verify, take negative operation can only logarithm
Type Object Operations, and negated operation can only be to integer Object Operations.
Binary operator is defined herein as BinaryExpr class.In BinaryExpr class, left method,
The effect of right method and operator method is to return to left operand, right operand and operator respectively.When eval method
When execution, program will first judge that current binocular operation expression is that assignment type or calculation type are then adjusted if it is assignment type
With computeAssign method, computeOp method is then called if it is calculation type.And the execution of computeOp method
Cheng Ze is that the type for first judging left and right operand is then directly handled by current method if it is character string type, if it is numerical value
Type then transfers to computeNumber method to handle.
Above embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention
In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (10)
1. a kind of cross-platform efficient automatic generation method of isomeric data bulletin, which is characterized in that it comprises the steps of,
(1), the processing of magnanimity isomeric data manages data using SX404DB key assignments type centralized database;The SX404DB key
Value type database is the key assignments type NoSQL database based on Inverted Index Technique;
(2), bulletin content automatically generates, by DocumentScript Script controlling system come dynamic generation bulletin content;
(3), the Automatic Typesetting of presentation format content and is pressed using to the format module injection based on Office OpenXML
The mode for being reduced to DOCX format file is completed.
2. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described
Using SX404DB key assignments type centralized database manage data, specifically, creation one database Session session object to
SX404DB database initiates a database session, specifically, creating an entity object first, entity pair is then arranged
The coding of elephant, type, area, time several attributes;Then the query of session object is called using the combination condition as parameter
Method inquires all entity objects for meeting corresponding conditions.
3. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described
Data are managed using SX404DB key assignments type centralized database, specifically, by the following procedure packet that contains in SX404DB into
Row:
Using program bag convertor, the conversion of format between data object is realized;
Using program bag Directory, index list management function is realized;
Using program bag Index, inquiry and modification to index are realized;
Using program bag Properties, the configuration of database is realized;
Using program bag Session, the management of accessing data base session;
Condition function using program bag Condition, in data manipulation;
Using program bag Sort, the ranking function in data query is realized;
Wherein, user is by including heavily loaded save method, delete method, update method and query in Session class
Resource in way access database, Session class and Searcher class, Processer class, DocumentConvertor class
Belong to dependence;
Session class passes through Searcher class to data query, by Processer class to data modification;Pass through
DocumentConvertor class realizes the conversion of data object format;
ConcurrentDirectory class and Searcher class, Processer class are paradigmatic relations;
The object of ConcurrentDirectory class respectively appears in Searcher class and Processer class as an attribute;
Processer class provides the following method for calling: logic deletion is carried out to data with delete method, when
Delete method is called, and the data operated enter recovery area, pass through clearTrash method clearing and retrieving area;
ForceDelete method is a physics delet method, and data will be unable to restore by physics deletion;With insert method to data
It is added;It is modified with update method to data;
Entire SX404DB database is that each file path only provides a ConcurrentDirectory example, passes through behaviour
Make ConcurrentDirectory example to realize storage, inquiry, modification and delete;The thread lock of ConcurrentDirectory
For read and write abruption, and all write operations are synchronous by the way of being lined up executes.
4. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described
Using SX404DB key assignments type centralized database manage data, specifically, all complete data objects are all with Document
Class is unit storage, and each Document object includes several Filed members.
5. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as described in claim 1, which is characterized in that described
By DocumentScript Script controlling system come dynamic generation bulletin content, specifically include:
(1), source code is cut into several words by morphological analysis;
(2), the hierarchical logic relationship between each word is cleared in syntactic analysis, generate several abstract syntax tree;
(3), machine language is generated, machine language is generated using interpreted languages processor, by interpreter by abstract syntax tree
It explains and executes one by one, and feed back the result of execution.
6. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 5, which is characterized in that described
Morphological analysis be realize a Lexer class segmenter by source code character string split become Token, pass through regular expression
Match to complete participle function, specifically,
The Token is divided into character string word class, numerical value word class, identifier word class, end of file word class, file
End mark class Token is realized using in singleton pattern insertion Token class as a static member;
In Lexer class, it is equipped with comPat, numPat, strPat, idPat character string type field, is matching annotation, number respectively
The regular expression of value Token, character string Token, identifier Token are additionally provided with regexPat character string type in Lexer class
Field, for legitimate characters string all in expression formula matching DocumentScript;It is held in the parsing process of Lexer class
When row, segmenter will read source code line by line, check whether each row content matches with regexPat one by one from left to right, and by institute
There is matched text string extracting;
Lexer object obtains source code by receiving Reader object;
During syntactic analysis after participle, Lexer class provides a peek method;Construction abstract syntax tree process be
The trace-back process of depth-first, in midway, discovery, which is configured with, mistakes, and needs to retract several words, reconfigures, specifically, mentioning
For the buffering queue of a peek method and an interim Token of storage;When constructing abstract syntax tree, first passes through peek method and obtain
Know below will read Token and be deposited into buffering queue, then the content in buffer area is judged, finally again
Token, which is obtained, by read method constructs abstract syntax tree;
In Lexer class in the specific implementation, read method reads can all judge whether buffering queue is dummy status every time, when for it is empty then
A Token is added into buffering queue, then the Token of buffering queue head is returned, and it is deleted from buffering queue;Institute
Stating can be stored in several Token of pre-read in buffering queue when peek method executes every time, and return as function
Value returns, and does not delete any element in buffering queue;
Several abstract syntax tree of the generation, specifically, by Token sequence according to the syntax rule of language from simple linear knot
Structure is assembled into tree structure, and abstract syntax tree is defined as the interface of entitled ASTree, and is equipped with several realization ASTree interfaces
Node class.
7. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 5, which is characterized in that described
Abstract syntax tree is explained and is executed one by one by interpreter, and feed back execution as a result, specifically, by interpreter to every
One abstract syntax tree evaluation, the method for evaluation traverse entire abstract syntax tree up to leaf in a recursive manner since root node
Child node, each access node can have an evaluation return value, and in addition to leaf node, the return value of other nodes all according to
Rely the return value in its child node.
8. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that described
Segmenter will read source code line by line, specifically, ought not include the blank character of front matched character string with comPat
Match, then the character string is one section of annotation, when the character string is numerical value type-word face amount, character string literal or identifier, then
It is matched with numPat, strPat or idPat;Being determined the Token of type, to be stored into Token queue to be returned, continues to use phase
Same method handles remaining part, constantly repeats, until source code terminates, source code can be split into one by Lexer
Token queue.
9. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that described
Several node classes for realizing ASTree interface can be divided into leaf node and non-leaf nodes;Leaf node no longer includes sub- section
Point, non-leaf nodes may include child node;Leaf node include four classes: Name class, NumberLiteral class,
StringLiteral class and NullStmnt class are all inherited in leaf node class, and non-leaf nodes is all inherited in non-leaf nodes
Class, non-leaf nodes can be divided into three classes: the first kind is used as the Row control of program, and this kind includes sequential organization class, uses
Make the branched structure class of the processing of expression formula and the control loop structure class as function.
10. a kind of cross-platform efficient automatic generation method of isomeric data bulletin as claimed in claim 6, which is characterized in that institute
The Lexer object stated obtains source code by receiving Reader object, has specifically included two methods of read and peek, described
Read method obtains Token since the head of source code one by one, can return to a new Token when called every time;It is described
Peek method is used for pre-read Token, i-th of Token after the Token that peek (i) will return to return read method;
Source code reading finishes, and read method and peek method will all return to Token.EOF.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810811216.4A CN108959626B (en) | 2018-07-23 | 2018-07-23 | Efficient automatic generation method for cross-platform heterogeneous data profile |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810811216.4A CN108959626B (en) | 2018-07-23 | 2018-07-23 | Efficient automatic generation method for cross-platform heterogeneous data profile |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108959626A true CN108959626A (en) | 2018-12-07 |
CN108959626B CN108959626B (en) | 2023-06-13 |
Family
ID=64464317
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810811216.4A Active CN108959626B (en) | 2018-07-23 | 2018-07-23 | Efficient automatic generation method for cross-platform heterogeneous data profile |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959626B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977547A (en) * | 2019-03-27 | 2019-07-05 | 北京金和网络股份有限公司 | Big data bulletin generation method based on dynamic modeling |
CN110610068A (en) * | 2019-09-16 | 2019-12-24 | 郑州昂视信息科技有限公司 | Method and device for application isomerization |
CN111143403A (en) * | 2019-12-10 | 2020-05-12 | 跬云(上海)信息科技有限公司 | SQL conversion method and device and storage medium |
CN111539200A (en) * | 2020-04-22 | 2020-08-14 | 北京字节跳动网络技术有限公司 | Method, device, medium and electronic equipment for generating rich text |
CN116450747A (en) * | 2023-06-16 | 2023-07-18 | 长沙数智科技集团有限公司 | Heterogeneous system collection processing system for office data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477568A (en) * | 2009-02-12 | 2009-07-08 | 清华大学 | Integrated retrieval method for structured data and non-structured data |
CN104615526A (en) * | 2014-12-05 | 2015-05-13 | 北京航空航天大学 | Monitoring system of large data platform |
CN105468571A (en) * | 2015-11-19 | 2016-04-06 | 中国地质大学(武汉) | Method and device used for automatically generating report |
CN105912633A (en) * | 2016-04-11 | 2016-08-31 | 上海大学 | Sparse sample-oriented focus type Web information extraction system and method |
EP3107014A1 (en) * | 2015-06-15 | 2016-12-21 | Palantir Technologies, Inc. | Data aggregation and analysis system |
CN106484767A (en) * | 2016-09-08 | 2017-03-08 | 中国科学院信息工程研究所 | A kind of event extraction method across media |
CN106649455A (en) * | 2016-09-24 | 2017-05-10 | 孙燕群 | Big data development standardized systematic classification and command set system |
-
2018
- 2018-07-23 CN CN201810811216.4A patent/CN108959626B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477568A (en) * | 2009-02-12 | 2009-07-08 | 清华大学 | Integrated retrieval method for structured data and non-structured data |
CN104615526A (en) * | 2014-12-05 | 2015-05-13 | 北京航空航天大学 | Monitoring system of large data platform |
EP3107014A1 (en) * | 2015-06-15 | 2016-12-21 | Palantir Technologies, Inc. | Data aggregation and analysis system |
CN105468571A (en) * | 2015-11-19 | 2016-04-06 | 中国地质大学(武汉) | Method and device used for automatically generating report |
CN105912633A (en) * | 2016-04-11 | 2016-08-31 | 上海大学 | Sparse sample-oriented focus type Web information extraction system and method |
CN106484767A (en) * | 2016-09-08 | 2017-03-08 | 中国科学院信息工程研究所 | A kind of event extraction method across media |
CN106649455A (en) * | 2016-09-24 | 2017-05-10 | 孙燕群 | Big data development standardized systematic classification and command set system |
Non-Patent Citations (2)
Title |
---|
R BETIK等: "Automatic Generation of Sythetic XML Document", 《DIPLOVMOVA PRACE UNIVERZITA KARLOVA 》 * |
李磊: "基于Hadoop的RSS内容抓取与排版系统的开发", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977547A (en) * | 2019-03-27 | 2019-07-05 | 北京金和网络股份有限公司 | Big data bulletin generation method based on dynamic modeling |
CN110610068A (en) * | 2019-09-16 | 2019-12-24 | 郑州昂视信息科技有限公司 | Method and device for application isomerization |
CN110610068B (en) * | 2019-09-16 | 2021-11-23 | 郑州昂视信息科技有限公司 | Method and device for application isomerization |
CN111143403A (en) * | 2019-12-10 | 2020-05-12 | 跬云(上海)信息科技有限公司 | SQL conversion method and device and storage medium |
CN111143403B (en) * | 2019-12-10 | 2021-05-14 | 跬云(上海)信息科技有限公司 | SQL conversion method and device and storage medium |
CN111539200A (en) * | 2020-04-22 | 2020-08-14 | 北京字节跳动网络技术有限公司 | Method, device, medium and electronic equipment for generating rich text |
CN111539200B (en) * | 2020-04-22 | 2023-08-18 | 北京字节跳动网络技术有限公司 | Method, device, medium and electronic equipment for generating rich text |
CN116450747A (en) * | 2023-06-16 | 2023-07-18 | 长沙数智科技集团有限公司 | Heterogeneous system collection processing system for office data |
CN116450747B (en) * | 2023-06-16 | 2023-08-29 | 长沙数智科技集团有限公司 | Heterogeneous system collection processing system for office data |
Also Published As
Publication number | Publication date |
---|---|
CN108959626B (en) | 2023-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959626A (en) | A kind of cross-platform efficient automatic generation method of isomeric data bulletin | |
US6785685B2 (en) | Approach for transforming XML document to and from data objects in an object oriented framework for content management applications | |
US6611844B1 (en) | Method and system for java program storing database object entries in an intermediate form between textual form and an object-oriented form | |
US6606632B1 (en) | Transforming transient contents of object-oriented database into persistent textual form according to grammar that includes keywords and syntax | |
US5295256A (en) | Automatic storage of persistent objects in a relational schema | |
US6704747B1 (en) | Method and system for providing internet-based database interoperability using a frame model for universal database | |
US6609130B1 (en) | Method for serializing, compiling persistent textual form of an object-oriented database into intermediate object-oriented form using plug-in module translating entries according to grammar | |
US6298354B1 (en) | Mechanism and process to transform a grammar-derived intermediate form to an object-oriented configuration database | |
US6598052B1 (en) | Method and system for transforming a textual form of object-oriented database entries into an intermediate form configurable to populate an object-oriented database for sending to java program | |
US9009195B2 (en) | Software framework that facilitates design and implementation of database applications | |
US6542899B1 (en) | Method and system for expressing information from an object-oriented database in a grammatical form | |
US20040044687A1 (en) | Apparatus and method using pre-described patterns and reflection to generate a database schema | |
US20060271885A1 (en) | Automatic database entry and data format modification | |
US20040044989A1 (en) | Apparatus and method using pre-described patterns and reflection to generate source code | |
US8707260B2 (en) | Resolving interdependencies between heterogeneous artifacts in a software system | |
Bertino et al. | Modeling multilevel entities using single level objects | |
Opmanis et al. | Multilevel data repository for ontological and meta-modeling | |
Mueck et al. | Index data structures in object-oriented databases | |
Völkel | D2. 3.3. v2 SemVersion Versioning RDF and Ontologies | |
Rose et al. | Schema versioning in a temporal object-oriented data model | |
Schilling et al. | Standard-oriented ontology export of domain catalogues from data dictionaries | |
Baker | Design and implementation of database computations in Java | |
Batory et al. | Introductory P2 System Manual | |
Barskiy | Code-First Development with Entity Framework | |
Toman | Storing XML Data In a Native Repository. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |