CN107944012A - Knowledge data computing system, method, server and storage medium - Google Patents

Knowledge data computing system, method, server and storage medium Download PDF

Info

Publication number
CN107944012A
CN107944012A CN201711297667.2A CN201711297667A CN107944012A CN 107944012 A CN107944012 A CN 107944012A CN 201711297667 A CN201711297667 A CN 201711297667A CN 107944012 A CN107944012 A CN 107944012A
Authority
CN
China
Prior art keywords
data
batch
knowledge
knowledge data
pending
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711297667.2A
Other languages
Chinese (zh)
Inventor
王杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201711297667.2A priority Critical patent/CN107944012A/en
Publication of CN107944012A publication Critical patent/CN107944012A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of knowledge data computing system, method, server and storage medium, which includes:Data memory module, for stored knowledge data and knowledge mapping;Data computation module, for providing the interface of exploitation data processing policy to developer, calculates pending knowledge data according to the customized data processing policy of the developer;Knowledge mapping update module, for the knowledge mapping in the data memory module according to the data update after processing.The embodiment of the present invention provides general knowledge data Computational frame, and the customized data processing policy of support policy developer, reduces development cost, and supports the data mart modeling process demand under different scenes.

Description

Knowledge data computing system, method, server and storage medium
Technical field
The present embodiments relate to data processing technique, more particularly to a kind of knowledge data computing system, method, server And storage medium.
Background technology
Knowledge mapping (Knowledge Graph, KG) is the semantic knowledge-base of structuring, for describing thing with sign format Manage the concept and its correlation in the world, its basic composition unit is " entity-relationship-entity " triple, and entity and Its association attributes-value pair, is interconnected by relation between entity, the webbed structure of knowledge of structure., can be with by knowledge mapping Realize that Web links transformation from web page interlinkage to concept, support user to press theme rather than string search, so as to fulfill real Semantic retrieval, the search engine of knowledge based collection of illustrative plates can to graphically need not to the knowledge of user feedback structuring, user Browse a large amount of webpages, it is possible to be accurately positioned and depth obtains knowledge.
The structure of knowledge mapping and renewal, based on handling the knowledge data captured, obtain satisfactory number According to.Different developers may have different data to calculate demand, and at present, developer needs to be built according to the calculating demand of oneself A whole set of computing architecture, development cost are higher.
The content of the invention
The embodiment of the present invention provides a kind of knowledge data computing system, method, server and storage medium, general to provide Knowledge data Computational frame, the customized data processing policy of support policy developer, reduce development cost.
In a first aspect, an embodiment of the present invention provides a kind of knowledge data computing system, including:
Data memory module, for stored knowledge data and knowledge mapping;
Data computation module, for providing the interface of exploitation data processing policy to developer, according to the developer certainly The data processing policy of definition calculates pending knowledge data;
Knowledge mapping update module, for the knowledge graph in the data memory module according to the data update after processing Spectrum.
Second aspect, the embodiment of the present invention additionally provide a kind of knowledge data computational methods, based on any implementation of the invention Knowledge data computing system realization described in example, including:
Obtain pending knowledge data;
Under the corresponding Computational frame of the pending knowledge data, according to the customized data processing policy of developer The pending knowledge data is calculated, wherein the data processing policy be the developer by preset interface into Row is self-defined;
According to the knowledge mapping in the data update data memory module after processing.
The third aspect, the embodiment of the present invention additionally provide a kind of server, and the server includes:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are performed by one or more of processors so that one or more of processing Device realizes the knowledge data computational methods as described in any embodiment of the present invention.
Fourth aspect, the embodiment of the present invention additionally provide a kind of computer-readable recording medium, are stored thereon with computer Program, realizes the knowledge data computational methods as described in any embodiment of the present invention when which is executed by processor.
The embodiment of the present invention provides general knowledge data Computational frame, and exploitation data processing policy is provided to developer Interface, support policy developer self-defining data handles logic in a manner of plug-in unit, while realizes data access and calculate and shell From developer only needs the specific calculating logic of focused data processing links without oneself building whole framework, develops unit journey Sequence, saves development cost and can easily be updated by interface and change the data processing policy of oneself, is i.e. the system is supported random The data processing policy of change.The system also supports the data of high-magnitude to store and read and write.In addition, the system supports multilingual open Data processing policy is sent out, is easy to carry out the extending transversely of container application deployment.By the system, developer can create, issue With the management application example of oneself.
Brief description of the drawings
Fig. 1 is the structure diagram for the knowledge data computing system that the embodiment of the present invention one provides;
Fig. 2 is the structure diagram of knowledge data computing system provided by Embodiment 2 of the present invention;
Fig. 3 is the layer architecture schematic diagram for the knowledge data computing system that the embodiment of the present invention three provides;
Fig. 4 is the structure diagram for the Stone batch Computational frames that the embodiment of the present invention three provides;
Fig. 5 is the schematic diagram that the knowledge data that the embodiment of the present invention three provides calculates;
Fig. 6 is the flow chart for the knowledge data computational methods that the embodiment of the present invention four provides;
Fig. 7 is the structure diagram for the server that the embodiment of the present invention five provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just It illustrate only part related to the present invention rather than entire infrastructure in description, attached drawing.
Embodiment one
Fig. 1 is the structure diagram for the knowledge data computing system that the embodiment of the present invention one provides, and is present embodiments provided A kind of general knowledge data Computational frame, is applicable to situation about being calculated knowledge data.As shown in Figure 1, the system Including:
Data memory module 100, for stored knowledge data and knowledge mapping;
Data computation module 200, for providing the interface of exploitation data processing policy to developer, according to the developer Customized data processing policy calculates pending knowledge data;
Knowledge mapping update module 300, for knowing in the data memory module 100 according to the data update after processing Know collection of illustrative plates.
Wherein, data memory module 100 can be based on distributed data base realize, with support high-magnitude (such as 10,000,000,000 grades, thousand Hundred million grades etc.) diagram data storage and data random read-write, and support the horizontal extension of storage size.Optionally, data store Module 100 encapsulates to obtain based on Hbase and Hadoop.The knowledge data stored in data memory module 100, which may come from, to be known Know collection of illustrative plates open platform, specifically, knowledge mapping open platform periodically captures data, by the data transfer of crawl to the system, Store in data memory module 100.Knowledge data and the storage format of knowledge mapping are W3C (World Wide Web Consortium, World Wide Web Consortium) JSON-LD (JavaScript Object Notation the Linked Data, JS that define Object tag links data).
Data computation module 200 can be understood as knowledge data Computational frame, there is provided development interface and digital independent interface, The function of performing plug-in unit and/or operation is also provided.The interface that developer can be provided by data computation module 200 is self-defined certainly Oneself data processing policy, the interface can support multilingual, such as C++ and python language.Data processing policy refers to pair The circular of knowledge data, data processing policy can be the forms of plug-in unit.If strategy needs to change, developer can It is simple and convenient with by the interface more new strategy.Optionally, which can be SDK (Software Development Kit, Software Development Kit).
Knowledge data is the data of 10,000,000,000 grades or higher magnitude, and according to different demands, the calculating to knowledge data includes: Streaming computing and batch calculate, wherein, streaming computing is calculated generally directed to steady generally directed to ratio more frequently data, batch is changed Fixed constant data, such as respectively towards historical datas such as emperors.In the embodiment of the present invention, the knowledge data of data computation module 200 Computational frame can include:Streaming computing frame and/or batch Computational frame.Either streaming computing or batch calculates, right The calculating of knowledge data mainly includes:Data cleansing, attribute alignment, entity normalizing, attribute polymerization etc., specifically how calculation basis In the customized data processing policy of developer.Data computation module 200 is used as Computational frame, it would be preferable to support plurality of application scenes And hundreds and thousands of data processing streams under scene, and support the horizontal extension of computing capability.
In specific implementation, can will be from storage system in face of different storage systems, such as Hbase and Hadoop The data abstraction of reading realizes data access and the stripping calculated, developer can directly grasp into back end (DataNode) Make back end, only need to pay close attention to specific calculating logic.
After calculating knowledge data, entity is obtained, knowledge mapping update module 300 can establish the entity and knowledge The association of existing entity, so that the entity is indexed to knowledge mapping, realizes the renewal of knowledge mapping in collection of illustrative plates.Know for use Know the product of collection of illustrative plates, based on the system, can realize real time data renewal, the relational hierarchy inquiry of no maximum, and meet a large amount The request amount of level.
The technical solution of the present embodiment, streaming computing scene and timeliness demand in being produced for knowledge mapping data, General knowledge data Computational frame is provided, the interface of exploitation data processing policy is provided to developer, in a manner of plug-in unit Support policy developer self-defining data handles logic, while realizes data access and calculate and peel off, and developer takes without oneself Whole framework is built, only needs the specific calculating logic of focused data processing links, develops stand-alone program, development cost is saved and can lead to Cross interface and easily update and change the data processing policy of oneself, is i.e. the data processing policy that changes at random of system support.Should System also supports the data of high-magnitude to store and read and write.In addition, the system supports multilingual exploitation data processing policy, be easy into The application of row container is disposed extending transversely.By the system, developer can create, issue and manage the application example of oneself.
Optionally, said system can also include:Visualization Platform, for the developer provide visualization control and The interface of monitoring.Developer is based on the Visualization Platform, the application of oneself deployment can be carried out visual start-stop supervision and Monitor plug-in component operation situation.Wherein application refers to the bag being made of plug-in unit and its actuator.
Embodiment two
On the basis of above-described embodiment, a kind of embodiment of data computation module 200, data are present embodiments provided Computing module 200 includes:Streaming computing submodule and/or batch calculating sub module.According to actual scene demand, knowledge data meter Streaming computing submodule (being suitable for streaming computing scene) can be set in calculation system, batch calculating sub module can also be set (be suitable for batch and calculate scene), can also set streaming computing submodule and batch calculating sub module (to be not only suitable for flowing at the same time Formula calculates scene, and calculates scene suitable for batch).
Fig. 2 is the structure diagram of knowledge data computing system provided by Embodiment 2 of the present invention, as shown in Fig. 2, data Computing module 200 includes:Streaming computing submodule 210 and batch calculating sub module 220.
The streaming computing submodule 210, for providing the interface of exploitation Stream Processing strategy to the developer, according to The customized Stream Processing strategy of developer carries out streaming computing to pending streaming knowledge data.
Wherein, streaming computing submodule 210 can be understood as streaming computing frame, and development strategy is provided to developer Interface, which can be SDK, and developer can develop the Stream Processing strategy of oneself based on SDK;Streaming computing frame is also Pending knowledge data can be obtained, streaming computing is carried out to knowledge data according to the Stream Processing strategy of developer's exploitation.
The batch calculating sub module 220, for providing the interface of exploitation batch processing strategy to the developer, according to The customized batch processing strategy of developer carries out batch calculating to pending batch knowledge data.
Wherein, batch calculating sub module 220 can be understood as batch Computational frame, and development strategy is provided to developer Interface, developer can develop the batch processing strategy of oneself based on the interface;Batch Computational frame, which can also obtain, to be waited to locate The batch knowledge data of reason, batch calculating is carried out according to the batch processing strategy of developer's exploitation to knowledge data.
Further, the streaming computing submodule 210 includes:First data capture unit and streaming computing unit.
First data capture unit, for obtaining the pending streaming knowledge number from the data memory module According to, and/or, receive the pending streaming knowledge data of crawl.First data capture unit mainly provides streaming computing Under to the read-write operation of data memory module 100, can also directly receive the streaming knowledge data that data grabber platform is sent.
Streaming computing unit, for running the plug-in unit of the Stream Processing strategy, to the pending streaming knowledge number According to progress streaming computing.Data after processing, which can be sent to data memory module 100, to be carried out building side calculating, to update knowledge graph Spectrum.
Further, the batch calculating sub module 220 includes:Second data capture unit and batch computing unit.
Second data capture unit, for obtaining the pending batch knowledge number from the data memory module According to.Second data capture unit is mainly provided under batch calculates to the read-write operation of data memory module 100.
Batch computing unit, for according to the homework type and the sequence of operation in the batch processing strategy, being treated to described The batch knowledge data of processing carries out batch calculating.
Wherein, the executable unit of batch computing unit is task, and task may include an operation or multiple operations.Class of jobs Type includes local homework type and distributed job type.Local homework type be directed to small scale data, can in local runtime, Unit calculates data;Distributed job type is needed for the extensive, data of high-magnitude (such as 10,000,000,000 grades, hundred billion grades) Data are calculated by distributed type assemblies., then can be according in user profile if multiple operations to be performed The sequence of operation perform successively.In practical applications, the data flow of same product line can rationally be encapsulated, will be same Multiple calculation procedures of a product line are performed as multiple operations, order, convenient management.
The batch computing unit is specifically used for:
For local homework type, if an only operation, is performed locally the operation, to pending batch Measure knowledge data and carry out batch calculating;If multiple operations, then each operation is being performed locally according to the sequence of operation, it is right The pending batch knowledge data carries out batch calculating;
For distributed job type, by operation changing into distributed job, by the distributed job and described wait to locate The batch knowledge data of reason sends to distributed type assemblies and carries out batch calculating.What distributed type assemblies performed is distributed job, because This is needed the operation changing in knowledge data computing system into distributed job so that distributed type assemblies can carry out batch meter Calculate.Specifically, it can be sealed by using programming interface order of the language that batch Computational frame is supported to distributed type assemblies Dress obtains, to realize the conversion of operation, to support the operation on different editions cluster.
Present embodiments provide streaming computing frame and batch Computational frame, it would be preferable to support the data mart modeling under different scenes Process demand.Batch Computational frame is related to two kinds of homework types, to meet different demands.
Optionally, said system can also include:Virtual cloud platform, for providing the running environment of application.
Optionally, said system can also include:Remote alarm interface, for the feelings in data processing policy operation failure Under condition, warning message is exported, to remind developer.Warning message can have diversified forms, for example, mail, short message etc..Such as should Malfunctioned with operation, alarm mail can be sent to developer.
Optionally, said system can also include:Daily record printing interface, for printing day after each data processing Will, and daily record is carried out reasonably to classify and manage, to be monitored, so as to aid in optimization data processing policy.For example, monitoring Whether data processing reaches expected degree, if not up to expected degree, can optimize some parameters in data processing policy.
Embodiment three
On the basis of above-described embodiment, the side of being preferable to carry out of knowledge data computing system is present embodiments provided Formula, as shown in figure 3, the system is divided into from bottom to top:Architecture layer, calculation block rack-layer and off-line strategy layer.
Architecture layer, is mainly used for providing the running environment of application (APP), the store function of high-magnitude and figure retrieval Service.
Wherein, encapsulate to obtain KGBase based on the Hbase and Hadoop to increase income and (deposited equivalent to the data in above-described embodiment Store up module 100), to realize the data storage function of high-magnitude, and knowledge mapping data are defined based on W3C JSON-ID standards Storage format.
GI (Graph Index, figure index, equivalent to the knowledge mapping update module 300 in above-described embodiment) is provided To the figure retrieval service of spectrum data in KGBase, i.e. data after convection type processing or batch processing carry out building side calculating, The association of entity is obtained, establishes index of the picture, updates knowledge mapping.
Virtual cloud platform provides running environment for application, and the present embodiment uses JPAAS (the Platform as a of Baidu Service, platform service) platform.
KAFKA is used for realization the transmission of streaming knowledge data, and the streaming knowledge data of real-time reception is transmitted to calculation block Rack-layer, or obtain pending streaming knowledge data from KGBase and be transmitted to calculation block rack-layer, with according to Stream Processing plan Slightly convection type knowledge data is calculated.
MAPRED (i.e. MapReduce) is used for realization the transmission of batch knowledge data, is obtained from KGBase pending Batch knowledge data is transmitted to calculation block rack-layer, to be calculated according to batch processing strategy batch knowledge data.
Calculation block rack-layer provides streaming computing frame (Mario) and batch Computational frame (Stone), to support different scenes Under big data processing business, equivalent to the data computation module 200 in above-described embodiment.Wherein, streaming computing frame is applicable in In streaming computing scene, such as change than more frequently data;Batch Computational frame is suitable for batch and calculates scene, such as stablizes Constant data.
Wherein, Mario-SDK is used for the interface that exploitation Stream Processing strategy is provided to developer.Mario-admin is stream Formula management of computing platform, is responsible for the actuator of the executable plug-in unit of assembling.Stream Processing strategy plug-in unit and execution are run on JPAAS The bag of device composition, you can realize the streaming computing of convection type knowledge data.RPC(Remote Procedure Call Protocol, remote procedure call protocol) read-write operation of the interface for providing data in KGBase, corresponding to streaming computing frame Frame.Stone is the batch Computational frame in the present embodiment, for performing operation, to realize that batch calculates.REST (Representational State Transfer, declarative state transfer) API (Application Programming Interface, application programming interface) it is used to provide the read-write operations of data in KGBase, corresponding to batch Computational frame.
The main Integrated Strategy processing routine by way of supporting the self-defined plug-in unit of developer of off-line strategy layer.Mario- App refers to the bag that can be run in virtual cloud platform being made of actuator (runner) and plug-in unit (plugin), wherein inserting Part is the Stream Processing strategy of developer's exploitation.Stone-job refers to the operation that batch Computational frame can perform, by developer Realized by interface.In practical applications, knowledge data computing system may have towards numerous developers, off-line strategy layer Various data processing policies, Mario-app and Stone-job in Fig. 3 only do example, are not intended to limit its number.
Convection type Computational frame and batch Computational frame illustrate separately below.
(1) Mario streaming computings frame
Mario streaming computing frames include:Mario-admin、Mario-runner、Mario-SDK、Mario-plugin And Mario-app.
Mario-SDK is used to provide the interface that developer develops Stream Processing strategy.
Mario-plugin refers to the Stream Processing strategy for relying on SDK exploitations, and Stream Processing strategy is slotting in the present embodiment The form of part.
Mario-admin is streaming computing management platform, is responsible for the actuator of the executable plug-in unit (plugin) of assembling (runner)。
Mario-runner is the actuator of actual motion plug-in unit.
Mario-app:It can be run by what actuator (runner) and plug-in unit (plugin) formed in virtual cloud platform Bag.Optionally, virtual cloud platform can be container cloud, such as the JPAAS of Baidu.The form of bag can be tar bags, certainly It can be the form that other do not influence normal operation.
(2) Stone batches Computational frame
Stone is the operation frame that offline batch calculates plug-in unit, there is provided a set of flexible programming interface, it would be preferable to support multi-lingual Speech, such as python and C++ etc., developer realizes interface according to program norm.
The executable unit that batch calculates is task (Task), and task includes one or more operations (Job), if multiple The execution sequence of operation, then developer's given multiple operations in configuration file is given according to this when carrying out batch calculating Order performs this multiple operation.Each self defining programm is an operation.Operation includes:The customized calculation procedure of developer, Input (Input), output (Output) and the configuration (Configuration) that the program is relied on.
Homework type includes:Local homework type (local classes) and distributed job type (pyhce classes).Wherein, it is local Homework type is directed to the data of small scale, data can be calculated in local runtime, unit;Distributed job type is for big The data of scale, high-magnitude (such as 10,000,000,000 grades, hundred billion grades), need to calculate data by distributed type assemblies.
As shown in figure 4, the structure diagram for Stone batch Computational frames.Each several part is explained as follows in figure:
Task-Loader (task loader), for loading the plug-in unit of batch processing strategy, to assemble task.
Controller (controller), for controlling the interaction between batch Computational frame lower module, such as arrow institute in figure Show.
IO-Adaptor (input/output adapter), for the knowledge data in KGBase to be docked to each operation, into Row data adaptation.Based on IO-Adaptor, the read-write operation of data in KGBase can be realized.
Lc-runner (locally executes device), for being performed locally the operation of local homework type, realizes to batch knowledge The calculating of data.If multiple operations, then this multiple operation is performed successively according to the sequence of operation in configuration file.
Cluster interactive module, the language (such as python) which is supported by using batch Computational frame collect distribution The programming interface order of group is packaged to obtain, to support the operation on different editions cluster.The module is used for distribution The operation changing of homework type is the distributed job that can be run in distributed type assemblies, and by the distributed job and pending Batch knowledge data send to corresponding distributed type assemblies, by the distributed type assemblies perform distributed job, realize batch know Know the calculating of data.If multiple operations, then this multiple operation is performed successively according to the sequence of operation in configuration file.
In Fig. 4 by taking python language and Hadoop clusters as an example, using python to Hadoop Streaming orders into Row encapsulation obtains Pystreaming (i.e. cluster interactive module), and using the Pystreaming modules, being by operation changing can Submit what is write with python in the MapReduce operations that specified Hadoop clusters are run, and to specified Hadoop clusters Program, to submit Streaming tasks to the cluster, the calculating of batch knowledge data is realized by the distributed type assemblies.
It can be seen from the above that said system is to subscribe to message system kafka, plug-in unit based on distributed storage, distributed post Management, Docker containers cloud deployment exploitation complete general-purpose computing system.Used for convenience of developer, system is developer Provide and support multilingual interface, developer can the data processing policy plug-in unit based on the interface exploitation oneself.For streaming Calculate, developer can be created, issue and be managed the application app examples of oneself by Mario management platforms, to container cloud The application disposed in Container carries out visual start-stop supervision and monitoring plug-in component operation situation.
Fig. 5 is the schematic diagram that the knowledge data that the embodiment of the present invention three provides calculates, including:Data introduce, data mart modeling And data application.
Data, which introduce, to be referred to capture knowledge data, can be realized by some existing platforms, such as shown in Fig. 5 KGopen (Baidu's knowledge mapping open platform), Aladdin (Baidu's open platform) and Spider (Baidu spider).
Data application refers to the specific product using knowledge mapping, such as question and answer, recommendation and Aladdin shown in Fig. 5. Data in KGBase carry out streaming and build storehouse and build storehouse in batches, are called for product.
Data mart modeling refers to the calculating of knowledge data, is realized by the knowledge data computing system of the embodiment of the present invention.Data The data transfer of introducing portion crawl is carried out into the Kafka message queues (Message Queue) of knowledge data computing system Caching.On the one hand, under the control of Mario-admin management platforms, pending streaming knowledge data in Kafka message queues It can be conveyed directly in streaming computing frame, Mario-app is run based on JPAAS, completes the calculating of convection type knowledge data, By the data transfer after calculating to KGBase, to carry out the renewal of knowledge mapping.On the other hand, knowing in Kafka message queues Know data to be possibly stored in KGBase, batch Computational frame obtains pending batch knowledge number from KGBase using MAPRED According to performing operation (Job) and complete calculating to batch knowledge data, by the data transfer after calculating to KGBase, to be known Know the renewal of collection of illustrative plates.
Example IV
Fig. 6 is the flow chart for the knowledge data computational methods that the embodiment of the present invention four provides, and can be based on above-described embodiment The knowledge data computing system realizes that support policy developer self-defining data processing strategy, reduces development cost, and prop up Hold the data mart modeling process demand under different scenes.
As shown in fig. 6, this method includes:
S601, obtains pending knowledge data.
Wherein it is possible to pending streaming knowledge data and/or batch knowledge data are obtained from data memory module; The pending streaming knowledge data of crawl can directly be received.
S602, under the corresponding Computational frame of the pending knowledge data, at the customized data of developer Reason strategy calculates the pending knowledge data, wherein to be that the developer passes through default for the data processing policy Interface carries out self-defined.
Wherein, Computational frame includes:Streaming computing frame and batch Computational frame, both frames are provided which interface, open Originator can handle strategy by the interface self-defining data, and then can be according to the data processing policy that developer defines to phase The knowledge data answered is calculated.
S603, according to the knowledge mapping in the data update data memory module after processing.
Wherein, after calculating knowledge data, entity is obtained, the entity and existing entity in knowledge mapping can be established Association, so that the entity is indexed to knowledge mapping, realize the renewal of knowledge mapping.
Optionally, S602 includes:If the pending knowledge data is streaming knowledge data, the developer is run The plug-in unit of customized Stream Processing strategy, streaming computing is carried out to the streaming knowledge data;If described pending knows Knowledge data are batch knowledge data, and the batch knowledge data is carried out according to the developer customized batch processing strategy Batch calculates.
Optionally, batch meter is carried out to the batch knowledge data according to the developer customized batch processing strategy Calculate, including:
The homework type and the sequence of operation in the batch processing strategy are obtained, wherein the homework type includes local make Industry type and distributed job type;
For local homework type, each operation is being performed locally according to the sequence of operation, to pending batch Measure knowledge data and carry out batch calculating;
For distributed job type, by operation changing into distributed job, by the distributed job and described wait to locate The batch knowledge data of reason sends to distributed type assemblies and carries out batch calculating.
Wherein, homework type and the sequence of operation can be obtained from policy configuration file.For local homework type, if An only operation, then be performed locally the operation, and batch calculating is carried out to batch knowledge data;If multiple operations, then Each operation is being performed locally according to the sequence of operation, batch calculating is carried out to batch knowledge data.
For distributed job type, operation is sent to distributed type assemblies and carries out batch calculating.Distributed type assemblies perform Be distributed job, it is therefore desirable to by the operation changing in knowledge data computing system into distributed job so that distributed Cluster can carry out batch calculating.Specifically, can be by using the language that batch Computational frame is supported to distributed type assemblies Programming interface order is packaged to obtain, to realize the conversion of operation, to support the operation on different editions cluster.
As an embodiment, under the corresponding Computational frame of the pending knowledge data, according to developer from The data processing policy of definition calculates the pending knowledge data, including:Run in the data processing policy In the case of failure, warning message is exported, so as to report an error to developer.
Optionally, the pending knowledge data is carried out according to the customized data processing policy of developer in S602 After calculating, the above method can also include:The print log after each data processing.On the basis of print log, Daily record is carried out reasonably to classify and manage, to be monitored, so as to aid in optimization data processing policy.
Embodiment five
A kind of server is present embodiments provided, the server includes:One or more processors;Memory, is used for The one or more programs of storage, when one or more of programs are performed by one or more of processors so that described one A or multiple processors realize the knowledge data computational methods as described in above-described embodiment.
Fig. 7 is the structure diagram for the server that the embodiment of the present invention five provides.Fig. 7 is shown suitable for being used for realizing this hair The block diagram of the exemplary servers 12 of bright embodiment.The server 12 that Fig. 7 is shown is only an example, should not be to the present invention The function and use scope of embodiment bring any restrictions.
As shown in fig. 7, server 12 is showed in the form of universal computing device.The component of server 12 can be included but not It is limited to:One or more processor or processing unit 16, system storage 28, connection different system component (including system Memory 28 and processing unit 16) bus 18.
Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and periphery component interconnection (PCI) bus.
Server 12 typically comprises various computing systems computer-readable recording medium.These media can any can be serviced The usable medium that device 12 accesses, including volatile and non-volatile medium, moveable and immovable medium.
System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Server 12 may further include other removable/nonremovable , volatile/non-volatile computer system storage medium.Only as an example, it is not removable to can be used for read-write for storage system 34 Dynamic, non-volatile magnetic media (Fig. 7 do not show, commonly referred to as " hard disk drive ").Although not shown in Fig. 7, it can provide For the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to moving anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.System storage 28 can include at least one program and produce Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention each The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as system storage In device 28, such program module 42 includes but not limited to operating system, one or more application program, other program modules And routine data, the realization of network environment may be included in each or certain combination in these examples.Program module 42 Usually perform the function and/or method in embodiment described in the invention.
Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.) Letter, can also enable a user to the equipment communication interacted with the server 12 with one or more, and/or with causing the server 12 any equipment (such as network interface card, the modem etc.) communications that can be communicated with one or more of the other computing device. This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20 With one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication. As shown in the figure, network adapter 20 is communicated by bus 18 with other modules of server 12.It should be understood that although do not show in figure Go out, server 12 can be combined and use other hardware and/or software module, included but not limited to:Microcode, device driver, Redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, thus perform various functions application and Data processing, such as realize the knowledge data computational methods that the embodiment of the present invention is provided.
Embodiment six
The embodiment of the present invention six additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, should The knowledge data computational methods as described in above-described embodiment are realized when program is executed by processor.
The computer-readable storage medium of the embodiment of the present invention, can use any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium includes:Tool There are the electrical connections of one or more conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any includes or the tangible medium of storage program, the program can be commanded execution system, device or device Using or it is in connection.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, which, which can send, propagates or transmit, is used for By instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service Pass through Internet connection for business).
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.

Claims (14)

  1. A kind of 1. knowledge data computing system, it is characterised in that including:
    Data memory module, for stored knowledge data and knowledge mapping;
    Data computation module, it is self-defined according to the developer for providing the interface of exploitation data processing policy to developer Data processing policy pending knowledge data is calculated;
    Knowledge mapping update module, for the knowledge mapping in the data memory module according to the data update after processing.
  2. 2. system according to claim 1, it is characterised in that the data computation module includes:Streaming computing submodule And/or batch calculating sub module;
    The streaming computing submodule, for providing the interface of exploitation Stream Processing strategy to the developer, is opened according to described The customized Stream Processing strategy of originator carries out streaming computing to pending streaming knowledge data;
    The batch calculating sub module, for providing the interface of exploitation batch processing strategy to the developer, is opened according to described The customized batch processing strategy of originator carries out batch calculating to pending batch knowledge data.
  3. 3. system according to claim 2, it is characterised in that the streaming computing submodule includes:
    First data capture unit, for obtaining the pending streaming knowledge data from the data memory module, and/ Or, receive the pending streaming knowledge data of crawl;
    Streaming computing unit, for running the plug-in unit of the Stream Processing strategy, to the pending streaming knowledge data into Row streaming computing.
  4. 4. system according to claim 2, it is characterised in that the batch calculating sub module includes:
    Second data capture unit, for obtaining the pending batch knowledge data from the data memory module;
    Batch computing unit, for according to the homework type and the sequence of operation in the batch processing strategy, to described pending Batch knowledge data carry out batch calculating.
  5. 5. system according to claim 4, it is characterised in that the homework type includes local homework type and distribution Homework type;
    The batch computing unit is specifically used for:
    For local homework type, each operation is being performed locally according to the sequence of operation, the pending batch is being known Know data and carry out batch calculating;
    For distributed job type, by operation changing into distributed job, by the distributed job and described pending Batch knowledge data sends to distributed type assemblies and carries out batch calculating.
  6. 6. according to any system in claim 1-5, it is characterised in that the data memory module be based on Hbase and Hadoop encapsulates to obtain.
  7. 7. according to any system in claim 1-5, it is characterised in that the system also includes:
    Remote alarm interface, in the case of data processing policy operation failure, exporting warning message.
  8. 8. according to any system in claim 1-5, it is characterised in that the system also includes:
    Visualization Platform, for providing the interface of visualization control and monitoring to the developer.
  9. 9. a kind of knowledge data computational methods, are realized based on any knowledge data computing system in claim 1-8, its It is characterized in that, including:
    Obtain pending knowledge data;
    Under the corresponding Computational frame of the pending knowledge data, according to the customized data processing policy of developer to institute State pending knowledge data to be calculated, carried out certainly by preset interface wherein the data processing policy is the developer Definition;
    According to the knowledge mapping in the data update data memory module after processing.
  10. 10. according to the method described in claim 9, it is characterized in that, in the corresponding calculation block of the pending knowledge data Under frame, the pending knowledge data is calculated according to the customized data processing policy of developer, including:
    If the pending knowledge data is streaming knowledge data, the customized Stream Processing strategy of the developer is run Plug-in unit, to the streaming knowledge data carry out streaming computing;
    If the pending knowledge data is batch knowledge data, according to the customized batch processing strategy of the developer Batch calculating is carried out to the batch knowledge data.
  11. 11. according to the method described in claim 10, it is characterized in that, according to the customized batch processing strategy of the developer Batch calculating is carried out to the batch knowledge data, including:
    The homework type and the sequence of operation in the batch processing strategy are obtained, wherein the homework type includes local class of jobs Type and distributed job type;
    For local homework type, each operation is being performed locally according to the sequence of operation, the pending batch is being known Know data and carry out batch calculating;
    For distributed job type, by operation changing into distributed job, by the distributed job and described pending Batch knowledge data sends to distributed type assemblies and carries out batch calculating.
  12. 12. according to any method in claim 9-11, it is characterised in that corresponded in the pending knowledge data Computational frame under, the pending knowledge data is calculated according to the customized data processing policy of developer, wrap Include:
    In the case of data processing policy operation failure, warning message is exported.
  13. 13. a kind of server, it is characterised in that the server includes:
    One or more processors;
    Memory, for storing one or more programs,
    When one or more of programs are performed by one or more of processors so that one or more of processors are real The now knowledge data computational methods as described in any in claim 9-12.
  14. 14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The knowledge data computational methods as described in any in claim 9-12 are realized during execution.
CN201711297667.2A 2017-12-08 2017-12-08 Knowledge data computing system, method, server and storage medium Pending CN107944012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711297667.2A CN107944012A (en) 2017-12-08 2017-12-08 Knowledge data computing system, method, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711297667.2A CN107944012A (en) 2017-12-08 2017-12-08 Knowledge data computing system, method, server and storage medium

Publications (1)

Publication Number Publication Date
CN107944012A true CN107944012A (en) 2018-04-20

Family

ID=61946279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711297667.2A Pending CN107944012A (en) 2017-12-08 2017-12-08 Knowledge data computing system, method, server and storage medium

Country Status (1)

Country Link
CN (1) CN107944012A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019210523A1 (en) * 2018-05-04 2019-11-07 深圳晶泰科技有限公司 Universal force field database and updating method and retrieval method therefor
CN110928529A (en) * 2019-11-06 2020-03-27 第四范式(北京)技术有限公司 Method and system for assisting operator development
CN111177199A (en) * 2019-12-31 2020-05-19 中国银行股份有限公司 Stream type calculation index generation system based on structured stream
WO2020143326A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge data storage method, device, computer apparatus, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106168965A (en) * 2016-07-01 2016-11-30 竹间智能科技(上海)有限公司 Knowledge mapping constructing system
CN106557470A (en) * 2015-09-24 2017-04-05 腾讯科技(北京)有限公司 data extraction method and device
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557470A (en) * 2015-09-24 2017-04-05 腾讯科技(北京)有限公司 data extraction method and device
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
CN106168965A (en) * 2016-07-01 2016-11-30 竹间智能科技(上海)有限公司 Knowledge mapping constructing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程学旗 等: "大数据系统和分析技术综述", 《软件学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019210523A1 (en) * 2018-05-04 2019-11-07 深圳晶泰科技有限公司 Universal force field database and updating method and retrieval method therefor
WO2020143326A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge data storage method, device, computer apparatus, and storage medium
CN110928529A (en) * 2019-11-06 2020-03-27 第四范式(北京)技术有限公司 Method and system for assisting operator development
CN111177199A (en) * 2019-12-31 2020-05-19 中国银行股份有限公司 Stream type calculation index generation system based on structured stream
CN111177199B (en) * 2019-12-31 2023-05-02 中国银行股份有限公司 Stream type calculation index generation system based on structured stream

Similar Documents

Publication Publication Date Title
JP6925321B2 (en) Edge Intelligence Platform and Internet of Things Sensor Stream System
US11616703B2 (en) Scalable visualization of health data for network devices
US11681695B2 (en) Aggregation in dynamic and distributed computing systems
US10255085B1 (en) Interactive graphical user interface with override guidance
US10685283B2 (en) Demand classification based pipeline system for time-series data forecasting
CN107944012A (en) Knowledge data computing system, method, server and storage medium
US20180260106A1 (en) Interactive graphical user-interface for analyzing and manipulating time-series projections
CN113485156B (en) Digital twin cloud platform of transformer and implementation method thereof
TWI519965B (en) Flexible assembly system and method for cloud service service for telecommunication application
CN112540948A (en) Route management through event stream processing cluster manager
US20210194774A1 (en) System and method for a generic key performance indicator platform
US11257396B2 (en) User interfaces for converting geospatial data into audio outputs
US11460973B1 (en) User interfaces for converting node-link data into audio outputs
Huang et al. Virtual reality scene modeling in the context of Internet of Things
Chang et al. Architecture design of datacenter for cloud english education platform
US20170161231A1 (en) Enhancing processing speeds for generating a model on an electronic device
Walia et al. A virtualization infrastructure cost model for 5g network slice provisioning in a smart factory
Emeakaroha et al. Contemporary analysis and architecture for a generic cloud-based sensor data management platform.
Choudhary et al. A concise review on internet of things: architecture and its enabling technologies
CN110324194A (en) Method, apparatus, system and medium for load balancing
WO2023141429A1 (en) Systems and methods for a distributed data platform
US20220374443A1 (en) Generation of data pipelines based on combined technologies and licenses
Štefanič et al. Quality of Service‐aware matchmaking for adaptive microservice‐based applications
CN108762890A (en) The management method and device of database in cloud management platform
Han et al. ECCVideo: A scalable edge cloud collaborative video analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180420