CN110489755A - Document creation method and device - Google Patents
Document creation method and device Download PDFInfo
- Publication number
- CN110489755A CN110489755A CN201910775353.1A CN201910775353A CN110489755A CN 110489755 A CN110489755 A CN 110489755A CN 201910775353 A CN201910775353 A CN 201910775353A CN 110489755 A CN110489755 A CN 110489755A
- Authority
- CN
- China
- Prior art keywords
- vector
- entity
- attribute
- text
- attribute value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000013507 mapping Methods 0.000 claims abstract description 89
- 238000012549 training Methods 0.000 claims description 33
- 230000007246 mechanism Effects 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 23
- 238000003062 neural network model Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 18
- 239000000284 extract Substances 0.000 claims description 16
- 238000013135 deep learning Methods 0.000 abstract description 20
- 238000005516 engineering process Methods 0.000 abstract description 13
- 230000006399 behavior Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 16
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 8
- 238000010276 construction Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000008451 emotion Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000012141 concentrate Substances 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of document creation method and devices.Wherein, this method comprises: concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity;The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector;It is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.The present invention solves the text information shortage generated in the related technology merely with deep learning algorithm and comments to the personalization of entity, the technical problem for causing the practical manifestation matching degree of text information and entity not high.
Description
Technical field
The present invention relates to natural language processing fields, in particular to a kind of document creation method and device.
Background technique
Text generation technology is the one of the field natural language processing (Natural Language Processing, NLP)
A important research direction, it is intended to the sentence for meeting human language rule, not no syntax error is automatically generated by rule, algorithm etc.
Son.
The application of text generation technology is very more.For example, at the end of every term, teacher needs in education sector
According to the daily performance of student, one section of descriptive, suggestiveness comment about student performance is write out.Tradition generates each student
The method of comment rely on teacher to write manually mostly, such mode not only consumes teacher's a large amount of time, moreover, teacher
The different daily performance for surely accurately remembering all students.Therefore, the settling mode of existing comparative maturity is, by the student of input
Information carries out similarity calculation with the comment template of manual construction, chooses the highest template of similarity as the comment generated.
However, the comment of the above method be it is artificial constructed come out, rather than pass through algorithm generate, therefore, the above method
It is not able to batch, is intelligent, generating different comments to each student to personalization.In addition, since comment is to pass through numerology
For similarity between raw information and comment template come what is obtained, this mode only accounts for the information on character surface, does not account for
To the semantic layer information of comment text.To solve this problem, deep learning algorithm considers system of the text in multiple dimensions
Score cloth, and comment is generated using the mode of probability.But deep learning algorithm lacks education information, to particular student
Daily behavior performance and comment between potential relationship Deficiency of learning ability, lack and personalized comment generated to particular student
Ability, so that the practical manifestation matching degree of comment and student that deep learning algorithm generates is not high, inaccurate.
Lack for the text information generated in the related technology merely with deep learning algorithm and the personalization of entity commented,
The technical problem for causing the practical manifestation matching degree of text information and entity not high, currently no effective solution has been proposed.
Summary of the invention
The present invention provides a kind of document creation method and devices, at least to solve in the related technology merely with deep learning
The text information that algorithm generates, which lacks, comments the personalization of student, leads to the practical manifestation matching degree of text information and student not
High technical problem.
According to an aspect of an embodiment of the present invention, a kind of document creation method is provided, comprising: concentrate from knowledge mapping
The object knowledge map of selection target entity, wherein knowledge mapping collection is for characterizing at least one entity on preset attribute
Attribute value, target entity are object to be evaluated;Entity vector, the attribute vector of target entity are determined based on object knowledge map
With attribute value vector, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector;Foundation entity vector,
Attribute vector and attribute value vector generate and the matched text of target entity.
Optionally, before the object knowledge map from knowledge mapping concentration selection target entity, the above method further include:
Generate knowledge mapping collection, wherein the step of generating knowledge mapping collection includes: the planning layer for constructing knowledge mapping collection, wherein planning
Layer includes at least: entity type, attribute type and attribute Value Types;Record information is obtained, wherein record information includes: at least one
Attribute value of a entity on preset attribute;Information input will be recorded into planning layer, generate knowledge mapping collection.
Optionally, before it will record information input to planning layer, the above method further include: record information is located in advance
Reason, the record information that obtains that treated, wherein pretreatment includes at least one following: entity extraction, attribute extraction, attribute value
It extracts and entity disambiguates.
Optionally, the entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, are wrapped
It includes: extracting entity information, attribute information and the attribute value information of the target entity in object knowledge map;It will using preset algorithm
Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to
Amount, obtains triple vector.
Optionally, it is generated and the matched text of target entity, packet according to entity vector, attribute vector and attribute value vector
It includes: entity vector, attribute vector and attribute value vector is input in text generation model, wherein text generation model includes
Deep neural network model, deep neural network model are obtained according to triple sample and samples of text training;It is raw based on text
It is generated and target entity matched text at model.
Optionally, before entity vector, attribute vector and attribute value vector are input to text generation model, above-mentioned side
Method further include: generate text generation model, wherein the step of generating text generation model includes: to obtain triple sample and text
This sample;The entity sample in triple sample is converted into boolean vector using preset algorithm, using preset model by ternary
Attribute sample, attribute value sample in group sample are converted into high latitude numeric type vector, obtain triple vector sample;Based on three
Tuple vector sample and samples of text training text generate model, obtain trained text generation model.
Optionally, model is generated based on triple vector sample and samples of text training text, obtains trained text
Generate model, comprising: using the coder processes triple vector sample and samples of text for combining attention mechanism, obtain up and down
Literary vector;Using the decoder processes context vector for combining attention mechanism, text information is obtained;Based on text information, with
It minimizes loss function and carrys out training text generation model.
Other side according to an embodiment of the present invention additionally provides a kind of document creation method, comprising: finger is chosen in reception
It enables, wherein choose instruction for choosing target entity to be evaluated;Display with the matched text of target entity, wherein text according to
Entity vector, attribute vector and the attribute value vector for the target entity determined according to the object knowledge map of target entity generate,
Object knowledge map comes from knowledge mapping collection, and knowledge mapping collection is for characterizing attribute of at least one entity on preset attribute
Value, entity vector, attribute vector and attribute value vector are characterized with triple vector.
Other side according to an embodiment of the present invention additionally provides a kind of text generating apparatus, comprising: selecting module,
For concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is for characterizing at least one
Attribute value of the entity on preset attribute, target entity are object to be evaluated;Determining module, for being based on object knowledge map
Determine the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and attribute value to
Amount is characterized with triple vector;Text generation module, for according to entity vector, attribute vector and the generation of attribute value vector and mesh
Mark the text of Entities Matching.
Other side according to an embodiment of the present invention, additionally provides a kind of storage medium, and storage medium includes storage
Program, wherein equipment where control storage medium executes any one of the above document creation method in program operation.
Other side according to an embodiment of the present invention additionally provides a kind of processor, and processor is used to run program,
In, program executes any one of the above document creation method when running.
In embodiments of the present invention, the object knowledge map of selection target entity is concentrated from knowledge mapping, wherein knowledge graph
Spectrum collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity;Based on target
Knowledge mapping determines the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and
Attribute value vector is characterized with triple vector;It is generated and target entity according to entity vector, attribute vector and attribute value vector
The text matched.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, then therefrom mentions
The triple vector of object knowledge map is taken, then deep learning algorithm is combined to generate comment.The program is by by knowledge mapping
It combines with deep learning, so that deep learning algorithm is connected to all properties of entity, and then solves in the related technology only
Lacked using the text information that deep learning algorithm generates and the personalization of entity is commented, leads to the reality of text information and entity
The not high technical problem of matching degree is showed, has achieved the purpose that farthest generation meets the comment that entity usually shows, and mentions
The high matching degree of comment.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart according to a kind of document creation method of the embodiment of the present application 1;
Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application 1;
Fig. 3 is the detailed schematic diagram based on the basic principle of comment generation method shown in Fig. 2;
Fig. 4 is the flow chart according to a kind of document creation method of the embodiment of the present application 2;
Fig. 5 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 3;And
Fig. 6 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 4.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment of document creation method is provided, it should be noted that in attached drawing
The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also,
Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch
The step of stating.
Fig. 1 is document creation method according to an embodiment of the present invention, as shown in Figure 1, this method comprises the following steps:
Step S102 concentrates the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is used for
Attribute value of at least one entity on preset attribute is characterized, target entity is object to be evaluated.
In a kind of optinal plan, above-mentioned entity can be the object of any required evaluation such as student, mechanism, company personnel;
For student, above-mentioned preset attribute can for classroom performance, self-image, social activity performance, Emotion expression, examine in week achievement,
Final grade etc., corresponding attribute value can be positive, clean and tidy, active, stable, big rise and fall, excellent etc.;For mechanism, on
Stating preset attribute can be brand image, granted patent quantity, annual return, public and social interest etc., and corresponding attribute value can be shadow
Ring it is big, be greater than 100,200,000,000, it is active etc..
Knowledge mapping (Knowledge Graph, KG) is used as knowledge organization and retrieval technique new under big data era
In describing concept and its correlation in physical world with sign format.Knowledge mapping collection summarizes the knowledge graph of multiple entities
Spectrum, the knowledge mapping of each entity record the daily behavior performance of the entity, since each entity is one independent
The knowledge mapping of body, each entity is naturally different.When some entity of needs assessment, i.e. target entity, from knowledge mapping collection
The object knowledge map of middle selection target entity.
By taking student as an example, in the summing-up comment for needing to generate student A, is concentrated from knowledge mapping and extract knowing for student A
Know map, which recites attribute value of the student A on all properties, that is, record the daily of student's A various aspects
Behavior expression.
Step S104 determines the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map,
Wherein, entity vector, attribute vector and attribute value vector are characterized with triple vector.
In above-mentioned steps, by the entity information, attribute information and the attribute that extract target entity in object knowledge map
Value information, and it is converted into the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, Ke Yi great
It is big to improve the matching degree for generating text.
It should be noted that triple is a kind of generic representation form of knowledge mapping, the present embodiment is carried out with triple
Citing, does not constitute the limitation to the application.
Step S106 is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.
In a kind of optinal plan, the text generation model of above-mentioned generation text can be deep neural network model.
Deep neural network is the interdisciplinary study combined about mathematics and computer, depth different from machine learning
Neural network is able to achieve the high latitude feature extraction of end-to-end data, abstract, and solve that feature in machine learning is difficult to extract asks
Topic.Such as typical Seq2Seq model, generation confrontation network model etc..
Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets
Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder
Vector, this vector are considered as the semanteme of the sequence, and decoder is by the vector decoding of this regular length at variable-length
Output sequence;It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two
Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on
It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached
Effect.
In above-mentioned steps, by by object knowledge map determine by entity vector, attribute vector and attribute value vector
The triple vector of composition, is input in deep neural network model, can be generated and shows phase with the daily behavior of target entity
Matched comment text.
It is easily noted that, in existing text generation field, even if there is the text generation of knowledge based map, nor complete
The information such as the entity information, attribute information and attribute value of this province of knowledge mapping are entirely used, but using knowledge mapping as intermediary, then
By search, or the method for calculating similitude searches suitable text.However, the present invention is by knowledge mapping and deep neural network
It combines, it is contemplated that the daily behavior of target entity shows, and to different entities, can automatically generate and meet the practical table of the entity
The comment of existing situation, improves the matching degree and accuracy of comment.
Still by taking student as an example, teacher needs to write when winter and summer vacation the comment of one section of summing-up for every student.Teacher
It can be by clicking knowledge mapping of the mouse from knowledge mapping concentration extraction student to be evaluated, which recites the student
The information such as daily performance, such as classroom performance, self-image, social performance, Emotion expression, final grade.Execute this implementation
The terminal of example method determines the triple vector of the student based on the knowledge mapping of the student, is input to depth nerve
In network model, the display interface of terminal can automatically generate the matched comment of daily performance with the student.Using upper
Scheme is stated, the time and efforts of teacher is greatly saved, avoids the daily behavior performance that infull student is not allowed or is remembered to teacher's note,
Comment and the not high problem of the matching degree of student is caused to occur.
Based on the scheme that the above embodiments of the present application provide, the object knowledge figure of selection target entity is concentrated from knowledge mapping
Spectrum, wherein knowledge mapping collection is to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity
Object;The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity to
Amount, attribute vector and attribute value vector are characterized with triple vector;It is generated according to entity vector, attribute vector and attribute value vector
With the matched text of target entity.Compared with prior art, the application establishes knowledge mapping using the usually performance of multiple entities
Then collection therefrom extracts the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.The program is logical
It crosses and combines knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, and then solve
Lack in the related technology merely with the text information that deep learning algorithm generates and the personalization of entity is commented, leads to text information
The not high technical problem with the practical manifestation matching degree of entity has reached and has farthest generated commenting of meeting that entity usually shows
The purpose of language improves the matching degree of comment.
Optionally, step S102 is being executed before the object knowledge map that knowledge mapping concentrates selection target entity, on
The method of stating can also include step S101, generate knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include
Following steps:
Step S1012 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class
Type and attribute Value Types.
In a kind of optinal plan, above-mentioned planning layer can pass through ontology construction tool Prot é g é software editing.Protégé
Software is ontology editing and knowledge acquisition software based on Java language exploitation, and user only needs to carry out ontology on concept hierarchy
The building of model, it is simple to operation.
Planning layer is equivalent to the framework of knowledge mapping, includes at least entity type, attribute type and attribute value in planning layer
Type, it is of course also possible to the information such as including the time.
Step S1014 obtains record information, wherein record information includes: category of at least one entity on preset attribute
Property value.
In a kind of optinal plan, above-mentioned record information can be by being manually entered to the computer for executing the present embodiment method
In terminal.For example, the classroom Li Ming shows positive, good, final grade A of image etc., big classroom performance love doze, social activity performance
Not positive, final grade B etc..In this way, can consider the daily behavior of target entity comprehensively when generating the text of target entity
Performance, avoids missing feature.
Step S1016 generates knowledge mapping collection by record information input into planning layer.
In above-mentioned steps, the entity information, attribute information, attribute value information that obtain in step S1014 correspondence are filled into
In the entity type of planning layer of step S1012 building, attribute type and attribute Value Types, the knowledge of all entities is constructed with this
Atlas, and store into graphic data base Neo4j.
Optionally, before executing step S1016 and will record information input to planning layer, the above method can also include:
Step S1015 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it
One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
In a kind of optinal plan, above-mentioned entity extraction, attribute extraction, attribute value is extracted can know for Entity recognition, attribute
Not, attribute value identifies, detection and classification including entity, attribute, attribute value.
It should be noted that being handled by entity disambiguation, two different names can be distinguished and represent the same entity,
Or the case where identical two different entity of name reference.
Optionally, step S104 determines entity vector, attribute vector and the attribute of target entity based on object knowledge map
It is worth vector, can specifically include following steps:
Step S1042 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map
Breath.
Entity information is converted to boolean vector using preset algorithm by step S1044, using preset model by attribute information
It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.
In a kind of optinal plan, above-mentioned preset algorithm can be able to be for solely hot (OneHot) algorithm, above-mentioned preset model
BERT model or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for
The building of the most advanced model of extensive task.
When carrying out information expression to the triple in object knowledge map, by entity information, attribute information and attribute value
Information is converted to the numerical value vector for being easy to neural network model processing, and neural network model is connected to all categories of target entity
Property, it can then extract high latitude attribute vector feature.Specifically, multiple triples of target entity in object knowledge map are extracted
(ei, pij, vij), wherein ei、pij、vijRespectively indicate i-th of entity information, j-th of attribute information of i-th entity, i-th
J-th of attribute value information of entity, then by ei, pij, vijV is characterized into respectivelyei, Vpi, VviVector.
In an alternative embodiment, using OneHot algorithm by entity eiBoolean vector is characterized into, BERT mould is used
Type is by attribute pij, attribute value vijCheng Gaowei numeric type vector is characterized, i.e.,
Wherein, t, s indicate the mapping function of feature extraction function and a neural network structure.
Optionally, step S106 is generated matched with target entity according to entity vector, attribute vector and attribute value vector
Text can specifically include following steps:
Entity vector, attribute vector and attribute value vector are input in text generation model by step S1062, wherein text
This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text
It arrives.
As previously mentioned, above-mentioned deep neural network model can be Seq2Seq model, generate and fight network model etc..
Step S1064 is generated and target entity matched text based on text generation model.
In above-mentioned steps, by entity vector Vei, attribute vector VpiWith attribute value vector VviIt is input to text generation model
In, that is, produce the summing-up comment text y about target entity*。
In a kind of optinal plan, above-mentioned summing-up comment text y*It is represented by output sequence y1,…yT’, wherein yt’It indicates
The output character at t ' moment, i.e.,
In above formula, t' ∈ { 1 ..., T'}, ct′Indicate the context vector at t ' moment, P (yt'|y1,...,yt'-1,ct') table
Show the probability vector of the t ' moment all candidate texts, arg max indicates to choose in the candidate text generated, and probability vector value is most
Big text.
Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S1062
Before type, the above method can also include step S1061, generate text generation model, wherein generate the step of text generation model
Suddenly may include:
Step S10611 obtains triple sample and samples of text.
In a kind of optinal plan, above-mentioned triple sample and samples of text can form alignment corpus, be expressed as ((e, p,
v),y)|((e1,p1,v1),y1),…((ei,pi,vi),yi)}。
Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S10612
If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to
Measure sample.
As previously mentioned, above-mentioned preset algorithm may be only hot algorithm, above-mentioned preset model may be alternating binary coding device
Characterization model, the process that triple sample is converted to triple vector sample is similar with step S1044, and details are not described herein.
Step S10613 generates model based on triple vector sample and samples of text training text, obtains trained
Text generation model.
After construction is good by the alignment corpus of triple and comment formed, so that it may which the corpus based on construction uses depth
The algorithm of neural network is spent, training text generates model.Since text generation model acquires the daily behavior table of all entities
Existing data, and in this, as training corpus, training text generates model, and therefore, above scheme can be according to the day of specific entity
Normal behavior expression generates the summing-up comment for meeting the entity.
In an alternative embodiment, step S10613 is based on triple vector sample and samples of text training text is raw
At model, trained text generation model is obtained, can specifically include following steps:
Step S106131 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism
To context vector.
In the model of Encoder-Deocder structure, there are two Recognition with Recurrent Neural Network, one is used as encoder, and one
A to be used as decoder, the list entries of a variable-length is become the vector of regular length by encoder, this vector can be seen
The semanteme of the sequence is done, decoder is by the vector decoding of this regular length at the output sequence of variable-length.However, if defeated
The length for entering sequence is very long, and the vector effect of regular length is rather bad, and combines the coding of attention (Attention) mechanism
Device can solve this ineffective problem.Specifically, the context vector of the encoder coding in conjunction with attention mechanism are as follows:
ct'=f (ht, yt'-1, st'-1, ct')
Wherein, f presentation code function, ht、yt′-1、st′-1、ct′Respectively indicate hidden layer output, the decoding of encoder t moment
The output at -1 moment of device t ', the implicit layer state at -1 moment of decoder t ', the context vector at t ' moment.
Step S106132 obtains text information using the decoder processes context vector for combining attention mechanism.
In the final context vector extracted in view of encoder, characteristic information is limited, and the more difficult office for capturing input
Portion's feature, therefore need to combine the output of attention mechanism in encoder as a result, input parameter as decoder.Specifically, it ties
Close the decoder output of attention mechanism are as follows:
P(yt'|y1,...,yt'-1,ct')=g (yt'-1,st',ct')
Wherein, g indicates decoding functions, yt′、yt′-1、st′、ct′Respectively indicate the output at t ' moment, the output at -1 moment of t ',
The implicit layer state at decoder t ' moment, the context vector at t ' moment.
Step S106133 is based on text information, carrys out training text generation model to minimize loss function.
It should be noted that the target of text generation model training are as follows: minimize the negative logarithm of text generation model seemingly
Right loss function:
Wherein, xi、yiI-th of input text, output text are respectively represented, i ∈ { 1 ..., I }, θ are model parameters.Training
The result is that generation text and urtext strong correlation, and minimize text grammer mistake.
Optionally, the preset algorithm in step S1044 and step S10612 is only hot algorithm, and preset model is BERT model
Or Word2Vector model.
Still by taking student as an example, Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application.Such as
Shown in Fig. 2, then record case of the acquisition teacher to the daily behavior data of each student first fills it into designed
In knowledge mapping planning layer, the knowledge mapping collection of all student performances is constructed with this.In the comment for needing to generate student to be evaluated
When, the object knowledge map for extracting student to be evaluated is concentrated from knowledge mapping, is then enter into trained text generation
In model, and then the summing-up comment about the daily performance of student is exported automatically.Detailed principle as shown in figure 3, student day
Normal behavioral data includes classroom performance, self-image, social performance, Emotion expression etc., and the planning layer planning of knowledge mapping has reality
Body type, attribute type and attribute Value Types take out the daily behavior data of student into entity is crossed when constructing knowledge mapping collection
Take, attribute extraction, attribute value extract, entity disambiguate etc. operations pre-processed, be subsequently filled in corresponding planning layer.
When evaluating student ID, the knowledge subgraph of student ID is extracted first, triplet information is then extracted, is converted into ternary
The form of group vector is characterized, and is recently entered in trained text generation model, candidate student's comment is generated, by teacher
Reaffirm whether need to modify to the comment, to obtain final student's comment.Wherein, text generation model is by ternary
Group sample and comment sample obtain the Encoder-Deocder model training for combining attention mechanism.
From the foregoing, it will be observed that the above embodiments of the present application, the object knowledge map of selection target entity is concentrated from knowledge mapping,
In, knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity;
The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector belongs to
Property vector sum attribute value vector with triple vector characterize;According to entity vector, attribute vector and the generation of attribute value vector and mesh
Mark the text of Entities Matching.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, so
The triple vector for therefrom extracting object knowledge map afterwards, then combines deep learning algorithm to generate comment;By the way that entity is believed
Breath, attribute information and attribute value information are converted to the numerical value vector for being easy to neural network model processing, neural network model connection
To all properties of target entity, high latitude attribute vector feature can be then extracted;By the Encoder- for combining attention mechanism
Deocder model can optimize the effect of text output;And then it solves and is generated in the related technology merely with deep learning algorithm
Text information lack to entity personalization comment, the technology for causing the practical manifestation matching degree of text information and entity not high
Problem has achieved the purpose that farthest generation meets the comment that entity usually shows, and improves the matching degree of comment.
Embodiment 2
According to embodiments of the present invention, the embodiment of another document creation method is provided from the angle of display interface, needed
It is noted that step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions
Middle execution, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein
Sequence executes shown or described step.
Fig. 4 is the method for another text generation according to an embodiment of the present invention, as shown in figure 4, this method includes as follows
Step:
Instruction is chosen in step S402, reception, wherein chooses instruction for choosing target entity to be evaluated.
In a kind of optinal plan, above-mentioned selection instruction can also pass through touch by teacher by mouse clicking trigger
Screen touches triggering;In a kind of optinal plan, above-mentioned target entity can be any to be evaluated for student, mechanism, company personnel etc.
Object.
Step S404, display and the matched text of target entity, wherein object knowledge map of the text according to target entity
Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and object knowledge map comes from knowledge mapping
Collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and attribute
It is worth vector to be characterized with triple vector.
In a kind of optinal plan, in a kind of above-mentioned optinal plan, above-mentioned entity can be student, mechanism, company personnel etc.
The object of any need evaluation;For student, above-mentioned preset attribute can for classroom performance, self-image, it is social show,
Emotion expression, week examine achievement, final grade etc., corresponding attribute value can for positive, clean and tidy, active, stable, big rise and fall,
It is excellent etc.;For mechanism, above-mentioned preset attribute can be right for brand image, granted patent quantity, annual return, public and social interest etc.
The attribute value answered can for influence it is big, be greater than 100,200,000,000, it is active etc.;The text generation model of above-mentioned generation text can be
Deep neural network model.
Knowledge mapping (Knowledge Graph, KG) is used as knowledge organization and retrieval technique new under big data era
In describing concept and its correlation in physical world with sign format.Knowledge mapping collection summarizes the knowledge graph of multiple entities
Spectrum, the knowledge mapping of each entity record the daily behavior performance of the entity, since each entity is one independent
The knowledge mapping of body, each entity is naturally different.When some entity of needs assessment, i.e. target entity, from knowledge mapping collection
The object knowledge map of middle selection target entity.
By extracting entity information, attribute information and the attribute value information of target entity in object knowledge map, and will
It is converted to the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, can greatly improve generation
The matching degree of text.
It should be noted that deep neural network is the interdisciplinary study combined about mathematics and computer, with machine
Study is different, and deep neural network is able to achieve the high latitude feature extraction of end-to-end data, is abstracted, and solves feature in machine learning
It is difficult to the problem of extracting.Such as typical Seq2Seq model, generation confrontation network model etc..
Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets
Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder
Vector, this vector can regard the semanteme of the sequence as, and decoder is by the vector decoding of this regular length at variable-length
Output sequence;It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two
Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on
It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached
Effect.
In above-mentioned steps, terminal detect the click target entity from display interface choose instruction after, just
It can be shown and the matched comment text of target entity in display interface.
It is easily noted that, in existing text generation field, even if there is the text generation of knowledge based map, nor complete
The information such as the entity information, attribute information and attribute value of this province of knowledge mapping are entirely used, but using knowledge mapping as intermediary, then
By search, or the method for calculating similitude searches suitable text.However, the present invention is by knowledge mapping and deep neural network
It combines, it is contemplated that the daily behavior of target entity shows, and to different entities, can automatically generate and meet the practical table of the entity
The comment of existing situation, improves the matching degree and accuracy of comment.
It based on the scheme that the above embodiments of the present application provide, receives choose instruction first, wherein choose instruction for choosing
Then target entity to be evaluated is shown and the matched text of target entity, wherein object knowledge of the text according to target entity
Entity vector, attribute vector and the attribute value vector for the target entity that map is determined generate, and object knowledge map comes from knowledge
Atlas, knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and
Attribute value vector is characterized with triple vector.Compared with prior art, the application is known using the usually performance foundation of multiple entities
Know atlas, then therefrom extract the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.It should
Scheme is by combining knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, in turn
It solves the text information shortage generated merely with deep learning algorithm in the related technology to comment the personalization of entity, leads to text
This information and the not high technical problem of the practical manifestation matching degree of entity, reach farthest to generate and meet entity and usually show
Comment purpose, improve the matching degree of comment.
Optionally, show that the above method can also include with before the matched text of target entity in execution step S404
Step S403 generates knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include following steps:
Step S4032 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class
Type and attribute Value Types.
In a kind of optinal plan, above-mentioned planning layer can pass through ontology construction tool Prot é g é software editing.Protégé
Software is ontology editing and knowledge acquisition software based on Java language exploitation, and user only needs to carry out ontology on concept hierarchy
The building of model, it is simple to operation.
Planning layer is equivalent to the framework of knowledge mapping, includes at least entity type, attribute type and attribute value in planning layer
Type, it is of course also possible to the information such as including the time.
Step S4034 obtains record information, wherein record information includes: category of at least one entity on preset attribute
Property value.
In a kind of optinal plan, above-mentioned record information can be by being manually entered to the computer for executing the present embodiment method
In terminal.For example, the classroom Li Ming shows positive, good, final grade A of image etc., big classroom performance love doze, social activity performance
Not positive, final grade B etc..In this way, can consider the daily behavior of target entity comprehensively when generating the text of target entity
Performance, avoids missing feature.
Step S4036 generates knowledge mapping collection by record information input into planning layer.
In above-mentioned steps, by entity information, attribute information, the corresponding reality for being filled into the planning layer built of attribute value information
In body type, attribute type and attribute Value Types, the knowledge mapping collection of all entities is constructed with this, and store and arrive graphic data base
In Neo4j.
Optionally, before executing step S4036 and will record information input to planning layer, the above method can also include:
Step S4035 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it
One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
In a kind of optinal plan, above-mentioned entity extraction, attribute extraction, attribute value is extracted can know for Entity recognition, attribute
Not, attribute value identifies, detection and classification including entity, attribute, attribute value.
It should be noted that being handled by entity disambiguation, two different names can be distinguished and represent the same entity,
Or the case where identical two different entity of name reference.
Optionally, entity vector, attribute vector and the category for the target entity that object knowledge map is determined in step S404
Property value vector, can specifically include following steps:
Step S4041 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map
Breath.
Entity information is converted to boolean vector using preset algorithm by step S4042, using preset model by attribute information
It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.
In a kind of optinal plan, above-mentioned preset algorithm can be only hot algorithm, and above-mentioned preset model can be BERT model
Or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for extensive task
Most advanced model building.
When carrying out information expression to the triple in object knowledge map, by entity information, attribute information and attribute value
Information is converted to the numerical value vector for being easy to neural network model processing, and neural network model is connected to all categories of target entity
Property, it can then extract high latitude attribute vector feature.Specifically, multiple triples of target entity in object knowledge map are extracted
(ei, pij, vij), wherein ei、pij、vijRespectively indicate i-th of entity information, j-th of attribute information of i-th entity, i-th
J-th of attribute value information of entity, then by ei, pij, vijV is characterized into respectivelyei, Vpi, VviVector.
In an alternative embodiment, using OneHot algorithm by entity eiBoolean vector is characterized into, BERT mould is used
Type is by attribute pij, attribute value vijCheng Gaowei numeric type vector is characterized, i.e.,
Wherein, t, s indicate the mapping function of feature extraction function and a neural network structure.
Optionally, the step of generating text according to entity vector, attribute vector and attribute value vector in step S404 is specific
It may comprise steps of:
Entity vector, attribute vector and attribute value vector are input in text generation model by step S4046, wherein text
This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text
It arrives.
As previously mentioned, above-mentioned deep neural network model can be Seq2Seq model, generate and fight network model etc..
Step S4047 is generated and target entity matched text based on text generation model.
In above-mentioned steps, by entity vector Vei, attribute vector VpiWith attribute value vector VviIt is input to text generation model
In, that is, produce the summing-up comment text y* about target entity.
In a kind of optinal plan, above-mentioned summing-up comment text y* is represented by output sequence y1,…yT’, wherein yt’It indicates
The output character at t ' moment, i.e.,
In above formula, t' ∈ { 1 ..., T'}, ct′Indicate the context vector at t ' moment, P (yt'|y1,...,yt'-1,ct') table
Show the probability vector of the t ' moment all candidate texts, arg max indicates to choose in the candidate text generated, and probability vector value is most
Big text.
Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S4046
Before type, the above method can also include step S4045, generate text generation model, wherein generate the step of text generation model
Suddenly may include:
Step S40451 obtains triple sample and samples of text.
In a kind of optinal plan, above-mentioned triple sample and samples of text can form alignment corpus, be expressed as ((e, p,
v),y)|((e1,p1,v1),y1),…((ei,pi,vi),yi)}。
Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S40452
If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to
Measure sample.
As previously mentioned, above-mentioned preset algorithm may be only hot algorithm, above-mentioned preset model may be alternating binary coding device
Characterization model, the process that triple sample is converted to triple vector sample is similar with step S1044, and details are not described herein.
Step S40453 generates model based on triple vector sample and samples of text training text, obtains trained
Text generation model.
After construction is good by the alignment corpus of triple and comment formed, so that it may which the corpus based on construction uses depth
The algorithm of neural network is spent, training text generates model.Since text generation model acquires the daily behavior table of all entities
Existing data, and in this, as training corpus, training text generates model, and therefore, above scheme can be according to the day of specific entity
Normal behavior expression generates the summing-up comment for meeting the entity.
In an alternative embodiment, step S40453 is based on triple vector sample and samples of text training text is raw
At model, trained text generation model is obtained, can specifically include following steps:
Step S404531 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism
To context vector.
In the model of Encoder-Deocder structure, there are two Recognition with Recurrent Neural Network, one is used as encoder, and one
A to be used as decoder, the list entries of a variable-length is become the vector of regular length by encoder, this vector can be seen
The semanteme of the sequence is done, decoder is by the vector decoding of this regular length at the output sequence of variable-length.However, if defeated
The length for entering sequence is very long, and the vector effect of regular length is rather bad, and combines the coding of attention (Attention) mechanism
Device can solve this ineffective problem.Specifically, the context vector of the encoder coding in conjunction with attention mechanism are as follows:
ct'=f (ht, yt'-1, st'-1, ct')
Wherein, f presentation code function, ht、yt′-1、st′-1、ct′Respectively indicate hidden layer output, the decoding of encoder t moment
The output at -1 moment of device t ', the implicit layer state at -1 moment of decoder t ', the context vector at t ' moment.
Step S404532 obtains text information using the decoder processes context vector for combining attention mechanism.
In the final context vector extracted in view of encoder, characteristic information is limited, and the more difficult office for capturing input
Portion's feature, therefore need to combine the output of attention mechanism in encoder as a result, input parameter as decoder.Specifically, it ties
Close the decoder output of attention mechanism are as follows:
P(yt'|y1,...,yt'-1,ct')=g (yt'-1,st',ct')
Wherein, g indicates decoding functions, yt′、yt′-1、st′、ct′Respectively indicate the output at t ' moment, the output at -1 moment of t ',
The implicit layer state at decoder t ' moment, the context vector at t ' moment.
Step S404533 is based on text information, carrys out training text generation model to minimize loss function.
It should be noted that the target of text generation model training are as follows: minimize the negative logarithm of text generation model seemingly
Right loss function:
Wherein, xi、yiI-th of input text, output text are respectively represented, i ∈ { 1 ..., I }, θ are model parameters.Training
The result is that generation text and urtext strong correlation, and minimize text grammer mistake.
Optionally, the preset algorithm in step S4042 and step S40452 is only hot algorithm, and preset model is BERT model
Or Word2Vector model.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 3
According to embodiments of the present invention, a kind of device of text generation is provided, Fig. 5 is the text according to the embodiment of the present application
The schematic diagram of generating means.As shown in figure 5, the device 500 includes selecting module 502, determining module 504 and text generation module
506。
Wherein, selecting module 502, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein know
Knowing atlas for characterizing attribute value of at least one entity on preset attribute, target entity is object to be evaluated;It determines
Module 504, for determining the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map, wherein
Entity vector, attribute vector and attribute value vector are characterized with triple vector;Text generation module 506, for according to entity to
Amount, attribute vector and attribute value vector generate and the matched text of target entity.
Optionally, above-mentioned apparatus can also include: map generation module, for concentrating selection target real from knowledge mapping
Before the object knowledge map of body, knowledge mapping collection is generated, wherein map generation module includes: building module, is known for constructing
Know the planning layer of atlas, wherein planning layer includes at least: entity type, attribute type and attribute Value Types;First obtains mould
Block, for obtaining record information, wherein record information includes: attribute value of at least one entity on preset attribute;It will record
Information input is into planning layer, and map generates submodule, for generating knowledge mapping collection.
Optionally, above-mentioned apparatus can also include: preprocessing module, for will record information input to planning layer it
Before, record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity
Extraction, attribute extraction, attribute value extracts and entity disambiguates.
Optionally it is determined that module includes: extraction module, the entity for extracting the target entity in object knowledge map is believed
Breath, attribute information and attribute value information;First conversion module, for using preset algorithm by entity information be converted to boolean to
Amount, is converted into higher-dimension numeric type vector for attribute information and attribute value information using preset model, obtains triple vector.
Optionally, text generation module includes: input module, is used for entity vector, attribute vector and attribute value vector
It is input in text generation model, wherein text generation model includes deep neural network model, deep neural network model root
It is obtained according to triple sample and samples of text training;Text generation submodule, for based on the generation of text generation model and target
Entities Matching text.
Optionally, above-mentioned apparatus can also include: model generation module, for by entity vector, attribute vector and category
Property value vector be input to before text generation model, generate text generation model, wherein model generation module includes: second to obtain
Modulus block, for obtaining triple sample and samples of text;Second conversion module, for using preset algorithm by triple sample
In entity sample be converted to boolean vector, using preset model by attribute sample, the attribute value sample standard deviation in triple sample
Higher-dimension numeric type vector is converted to, triple vector sample is obtained;Training module, for being based on triple vector sample and text
Sample training text generation model obtains trained text generation model.
Optionally, training module includes: coding module, for utilizing the coder processes triple for combining attention mechanism
Vector sample and samples of text, obtain context vector;Decoder module, for utilizing the decoder processes for combining attention mechanism
Context vector obtains text information;Training submodule trains text for being based on text information to minimize loss function
This generation model.
Optionally, above-mentioned preset algorithm is only hot algorithm, and preset model is BERT model or Word2Vector model.
It should be noted that above-mentioned selecting module 502, determining module 504 and text generation module 506 correspond to embodiment
Step S102 to step S106 in 1, three modules are identical as example and application scenarios that corresponding step is realized, but not
It is limited to 1 disclosure of that of above-described embodiment.
Embodiment 4
According to embodiments of the present invention, the device of another text generation is provided, Fig. 6 is the text according to the embodiment of the present application
The schematic diagram of this generating means.As shown in fig. 6, the device 600 includes receiving module 602 and display module 604.
Wherein, instruction is chosen for receiving in receiving module 602, wherein choose instruction for choosing target to be evaluated real
Body;Display module 604, for showing and the matched text of target entity, wherein object knowledge figure of the text according to target entity
The entity vector, attribute vector and attribute value vector for composing the target entity determined generate, and object knowledge map comes from knowledge graph
Spectrum collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and category
Property value vector with triple vector characterize.
Optionally, above-mentioned apparatus can also include map generation module, in display and the matched text of target entity
Before, knowledge mapping collection is generated, wherein map generation module may include: building module, for constructing the rule of knowledge mapping collection
Draw layer, wherein planning layer includes at least: entity type, attribute type and attribute Value Types;First obtains module, for obtaining
Information is recorded, wherein record information includes: attribute value of at least one entity on preset attribute;Map generates submodule, uses
In information input will be recorded into planning layer, knowledge mapping collection is generated.
Optionally, above-mentioned apparatus can also include preprocessing module, be used for before it will record information input to planning layer,
Record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity is taken out
It takes, attribute extraction, attribute value extract and entity disambiguates.
It optionally, further include determining module in display module, for determining the reality of target entity according to object knowledge map
Body vector, attribute vector and attribute value vector, wherein determining module may include: extraction module, for extracting object knowledge figure
Entity information, attribute information and the attribute value information of target entity in spectrum;First conversion module, for using preset algorithm will
Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to
Amount, obtains triple vector.
It optionally, further include text generation module in display module, for according to entity vector, attribute vector and attribute value
Vector generates text, wherein text generation module may include: input module, be used for entity vector, attribute vector and attribute
Value vector is input in text generation model, wherein text generation model includes deep neural network model, deep neural network
Model is obtained according to triple sample and samples of text training;Text generation submodule, for being generated based on text generation model
With target entity matched text.
Optionally, above-mentioned apparatus can also include model generation module, for by entity vector, attribute vector and attribute
Value vector is input to before text generation model, generates text generation model, wherein model generation module may include: second
Module is obtained, for obtaining triple sample and samples of text;Second conversion module, for using preset algorithm by triple sample
Entity sample in this is converted to boolean vector, using preset model by attribute sample, the attribute value sample in triple sample
It is converted into high latitude numeric type vector, obtains triple vector sample;Training module, for being based on triple vector sample and text
This sample training text generation model obtains trained text generation model.
Optionally, training module may include: coding module, for utilizing the coder processes three for combining attention mechanism
Tuple vector sample and samples of text, obtain context vector;Decoder module, for utilizing the decoder for combining attention mechanism
Context vector is handled, text information is obtained;Training submodule is instructed for being based on text information with minimizing loss function
Practice text generation model.
Optionally, above-mentioned preset algorithm is only hot algorithm, and preset model is BERT model or Word2Vector model.
It should be noted that above-mentioned receiving module 602 and display module 604 correspond to the step S402 in embodiment 2 extremely
Step S404, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments 2
Disclosure of that.
Embodiment 5
According to embodiments of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, wherein in journey
Equipment executes the document creation method in embodiment 1 or 2 where controlling storage medium when sort run.
Embodiment 6
According to embodiments of the present invention, a kind of processor is provided, processor is for running program, wherein run in program
Document creation method in Shi Zhihang embodiment 1 or 2.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (11)
1. a kind of document creation method characterized by comprising
The object knowledge map of selection target entity is concentrated from knowledge mapping, wherein the knowledge mapping collection is for characterizing at least
Attribute value of one entity on preset attribute, the target entity are object to be evaluated;
The entity vector, attribute vector and attribute value vector of the target entity are determined based on the object knowledge map, wherein
The entity vector, attribute vector and attribute value vector are characterized with triple vector;
It is generated and the matched text of the target entity according to the entity vector, attribute vector and attribute value vector.
2. the method according to claim 1, wherein concentrating the target of selection target entity to know from knowledge mapping
Before knowing map, the method also includes: generate the knowledge mapping collection, wherein generate the knowledge mapping Ji Buzhoubao
It includes:
Construct the planning layer of the knowledge mapping collection, wherein the planning layer includes at least: entity type, attribute type and category
Property Value Types;
Record information is obtained, wherein the record information includes: attribute value of at least one entity on preset attribute;
By the record information input into the planning layer, the knowledge mapping collection is generated.
3. according to the method described in claim 2, it is characterized in that, by the record information input to the planning layer it
Before, the method also includes:
The record information is pre-processed, the record information that obtains that treated, wherein the pretreatment include it is following at least
One of: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
4. the method according to claim 1, wherein determining the target entity based on the object knowledge map
Entity vector, attribute vector and attribute value vector, comprising:
Extract entity information, attribute information and the attribute value information of the target entity in the object knowledge map;
The entity information is converted into boolean vector using preset algorithm, using preset model by the attribute information and described
Attribute value information is converted into high latitude numeric type vector, obtains the triple vector.
5. the method according to claim 1, wherein according to the entity vector, attribute vector and attribute value to
Amount generates and the matched text of the target entity, comprising:
The entity vector, attribute vector and attribute value vector are input in text generation model, wherein the text generation
Model includes deep neural network model, and the deep neural network model is according to triple sample and samples of text trained
It arrives;
It is generated and the matched text of the target entity based on the text generation model.
6. according to the method described in claim 5, it is characterized in that, by the entity vector, attribute vector and attribute value to
Amount is input to before text generation model, the method also includes: generate the text generation model, wherein generate the text
The step of this generation model includes:
Obtain the triple sample and samples of text;
The entity sample in the triple sample is converted into boolean vector using preset algorithm, it will be described using preset model
Attribute sample, attribute value sample in triple sample are converted into high latitude numeric type vector, obtain triple vector sample;
Based on the triple vector sample and the samples of text training text generation model, trained text is obtained
Generate model.
7. according to the method described in claim 5, it is characterized in that, being based on the triple vector sample and the samples of text
The training text generation model, obtains trained text generation model, comprising:
Using triple vector sample described in the coder processes in conjunction with attention mechanism and the samples of text, context is obtained
Vector;
Using context vector described in the decoder processes in conjunction with attention mechanism, text information is obtained;
Based on the text information, the text generation model is trained to minimize loss function.
8. a kind of document creation method characterized by comprising
Instruction is chosen in reception, wherein described to choose instruction for choosing target entity to be evaluated;
Display and the matched text of the target entity, wherein object knowledge map of the text according to the target entity
Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and the object knowledge map comes from
Knowledge mapping collection, the knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, the entity to
Amount, attribute vector and attribute value vector are characterized with triple vector.
9. a kind of text generating apparatus characterized by comprising
Selecting module, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein the knowledge mapping collection
For characterizing attribute value of at least one entity on preset attribute, the target entity is object to be evaluated;
Determining module, for determining entity vector, attribute vector and the category of the target entity based on the object knowledge map
Property value vector, wherein the entity vector, attribute vector and attribute value vector with triple vector characterize;
Text generation module, for being generated and the target entity according to the entity vector, attribute vector and attribute value vector
Matched text.
10. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require 1 or 8 document creation method.
11. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit requires 1 or 8 document creation method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910775353.1A CN110489755A (en) | 2019-08-21 | 2019-08-21 | Document creation method and device |
PCT/CN2019/126797 WO2021031480A1 (en) | 2019-08-21 | 2019-12-20 | Text generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910775353.1A CN110489755A (en) | 2019-08-21 | 2019-08-21 | Document creation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489755A true CN110489755A (en) | 2019-11-22 |
Family
ID=68552697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910775353.1A Pending CN110489755A (en) | 2019-08-21 | 2019-08-21 | Document creation method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110489755A (en) |
WO (1) | WO2021031480A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061152A (en) * | 2019-12-23 | 2020-04-24 | 深圳供电局有限公司 | Attack recognition method based on deep neural network and intelligent energy power control device |
CN111209389A (en) * | 2019-12-31 | 2020-05-29 | 天津外国语大学 | Movie story generation method |
CN111897955A (en) * | 2020-07-13 | 2020-11-06 | 广州视源电子科技股份有限公司 | Comment generation method, device and equipment based on coding and decoding and storage medium |
CN111930959A (en) * | 2020-07-14 | 2020-11-13 | 上海明略人工智能(集团)有限公司 | Method and device for generating text by using map knowledge |
CN112036146A (en) * | 2020-08-25 | 2020-12-04 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal device and storage medium |
CN112069781A (en) * | 2020-08-27 | 2020-12-11 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal device and storage medium |
WO2021031480A1 (en) * | 2019-08-21 | 2021-02-25 | 广州视源电子科技股份有限公司 | Text generation method and device |
CN113111188A (en) * | 2021-04-14 | 2021-07-13 | 清华大学 | Text generation method and system |
CN113157941A (en) * | 2021-04-08 | 2021-07-23 | 支付宝(杭州)信息技术有限公司 | Service characteristic data processing method, service characteristic data processing device, text generating method, text generating device and electronic equipment |
CN113488165A (en) * | 2021-07-26 | 2021-10-08 | 平安科技(深圳)有限公司 | Text matching method, device and equipment based on knowledge graph and storage medium |
CN113569554A (en) * | 2021-09-24 | 2021-10-29 | 北京明略软件系统有限公司 | Entity pair matching method and device in database, electronic equipment and storage medium |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158189B (en) * | 2021-04-28 | 2023-12-26 | 绿盟科技集团股份有限公司 | Method, device, equipment and medium for generating malicious software analysis report |
CN113239203A (en) * | 2021-06-02 | 2021-08-10 | 北京金山数字娱乐科技有限公司 | Knowledge graph-based screening method and device |
CN113609291A (en) * | 2021-07-27 | 2021-11-05 | 科大讯飞(苏州)科技有限公司 | Entity classification method and device, electronic equipment and storage medium |
CN113761167B (en) * | 2021-09-09 | 2023-10-20 | 上海明略人工智能(集团)有限公司 | Session information extraction method, system, electronic equipment and storage medium |
CN114186063B (en) * | 2021-12-14 | 2024-06-21 | 合肥工业大学 | Training method and classification method of cross-domain text emotion classification model |
CN114860945B (en) * | 2022-02-14 | 2024-09-27 | 合肥工业大学 | High-quality noise detection method and device based on rule information |
CN116028647A (en) * | 2023-02-07 | 2023-04-28 | 中科乐听智能技术(济南)有限公司 | Knowledge-graph-based fusion education intelligent comment method and system |
CN116306925B (en) * | 2023-03-14 | 2024-05-03 | 中国人民解放军总医院 | Method and system for generating end-to-end entity link |
CN116150929B (en) * | 2023-04-17 | 2023-07-07 | 中南大学 | Construction method of railway route selection knowledge graph |
CN116452072B (en) * | 2023-06-19 | 2023-08-29 | 华南师范大学 | Teaching evaluation method, system, equipment and readable storage medium |
CN117332282B (en) * | 2023-11-29 | 2024-03-08 | 之江实验室 | Knowledge graph-based event matching method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039275A1 (en) * | 2015-08-03 | 2017-02-09 | International Business Machines Corporation | Automated Article Summarization, Visualization and Analysis Using Cognitive Services |
CN108763336A (en) * | 2018-05-12 | 2018-11-06 | 北京无忧创新科技有限公司 | A kind of visa self-help serving system |
CN109684394A (en) * | 2018-12-13 | 2019-04-26 | 北京百度网讯科技有限公司 | Document creation method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11163836B2 (en) * | 2018-02-12 | 2021-11-02 | International Business Machines Corporation | Extraction of information and smart annotation of relevant information within complex documents |
CN108345690B (en) * | 2018-03-09 | 2020-11-13 | 广州杰赛科技股份有限公司 | Intelligent question and answer method and system |
CN109189944A (en) * | 2018-09-27 | 2019-01-11 | 桂林电子科技大学 | Personalized recommending scenery spot method and system based on user's positive and negative feedback portrait coding |
CN110489755A (en) * | 2019-08-21 | 2019-11-22 | 广州视源电子科技股份有限公司 | Document creation method and device |
-
2019
- 2019-08-21 CN CN201910775353.1A patent/CN110489755A/en active Pending
- 2019-12-20 WO PCT/CN2019/126797 patent/WO2021031480A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039275A1 (en) * | 2015-08-03 | 2017-02-09 | International Business Machines Corporation | Automated Article Summarization, Visualization and Analysis Using Cognitive Services |
CN108763336A (en) * | 2018-05-12 | 2018-11-06 | 北京无忧创新科技有限公司 | A kind of visa self-help serving system |
CN109684394A (en) * | 2018-12-13 | 2019-04-26 | 北京百度网讯科技有限公司 | Document creation method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
蔡圆媛: "《大数据环境下基于知识整合的语义计算技术与应用》", 31 August 2018, 北京理工大学出版社 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021031480A1 (en) * | 2019-08-21 | 2021-02-25 | 广州视源电子科技股份有限公司 | Text generation method and device |
CN111061152A (en) * | 2019-12-23 | 2020-04-24 | 深圳供电局有限公司 | Attack recognition method based on deep neural network and intelligent energy power control device |
CN111209389B (en) * | 2019-12-31 | 2023-08-11 | 天津外国语大学 | Movie story generation method |
CN111209389A (en) * | 2019-12-31 | 2020-05-29 | 天津外国语大学 | Movie story generation method |
CN111897955A (en) * | 2020-07-13 | 2020-11-06 | 广州视源电子科技股份有限公司 | Comment generation method, device and equipment based on coding and decoding and storage medium |
CN111897955B (en) * | 2020-07-13 | 2024-04-09 | 广州视源电子科技股份有限公司 | Comment generation method, device, equipment and storage medium based on encoding and decoding |
CN111930959A (en) * | 2020-07-14 | 2020-11-13 | 上海明略人工智能(集团)有限公司 | Method and device for generating text by using map knowledge |
CN111930959B (en) * | 2020-07-14 | 2024-02-09 | 上海明略人工智能(集团)有限公司 | Method and device for generating text by map knowledge |
CN112036146A (en) * | 2020-08-25 | 2020-12-04 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal device and storage medium |
CN112036146B (en) * | 2020-08-25 | 2024-08-27 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal equipment and storage medium |
CN112069781B (en) * | 2020-08-27 | 2024-01-02 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal equipment and storage medium |
CN112069781A (en) * | 2020-08-27 | 2020-12-11 | 广州视源电子科技股份有限公司 | Comment generation method and device, terminal device and storage medium |
CN113157941A (en) * | 2021-04-08 | 2021-07-23 | 支付宝(杭州)信息技术有限公司 | Service characteristic data processing method, service characteristic data processing device, text generating method, text generating device and electronic equipment |
CN113111188B (en) * | 2021-04-14 | 2022-08-09 | 清华大学 | Text generation method and system |
CN113111188A (en) * | 2021-04-14 | 2021-07-13 | 清华大学 | Text generation method and system |
CN113488165A (en) * | 2021-07-26 | 2021-10-08 | 平安科技(深圳)有限公司 | Text matching method, device and equipment based on knowledge graph and storage medium |
CN113488165B (en) * | 2021-07-26 | 2023-08-22 | 平安科技(深圳)有限公司 | Text matching method, device, equipment and storage medium based on knowledge graph |
CN113569554A (en) * | 2021-09-24 | 2021-10-29 | 北京明略软件系统有限公司 | Entity pair matching method and device in database, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2021031480A1 (en) | 2021-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489755A (en) | Document creation method and device | |
Bang et al. | Explaining a black-box by using a deep variational information bottleneck approach | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN109271506A (en) | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN110516245A (en) | Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium | |
CN108536681A (en) | Intelligent answer method, apparatus, equipment and storage medium based on sentiment analysis | |
CN108804654A (en) | A kind of collaborative virtual learning environment construction method based on intelligent answer | |
CN108416065A (en) | Image based on level neural network-sentence description generates system and method | |
CN107861938A (en) | A kind of POI official documents and correspondences generation method and device, electronic equipment | |
CN107423398A (en) | Exchange method, device, storage medium and computer equipment | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN106897559A (en) | A kind of symptom and sign class entity recognition method and device towards multi-data source | |
CN108491421A (en) | A kind of method, apparatus, equipment and computer storage media generating question and answer | |
CN109446505A (en) | Model essay generation method and system | |
CN113886567A (en) | Teaching method and system based on knowledge graph | |
CN109710744A (en) | A kind of data matching method, device, equipment and storage medium | |
CN110134954A (en) | A kind of name entity recognition method based on Attention mechanism | |
CN109614618A (en) | Multi-semantic-based extraset word processing method and device | |
CN108959483A (en) | Search-based learning auxiliary method and electronic equipment | |
CN110795565A (en) | Semantic recognition-based alias mining method, device, medium and electronic equipment | |
CN117055724A (en) | Generating type teaching resource system in virtual teaching scene and working method thereof | |
CN106354852A (en) | Search method and device based on artificial intelligence | |
CN107679225A (en) | A kind of reply generation method based on keyword | |
CN109614480A (en) | A kind of generation method and device of the autoabstract based on production confrontation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |