CN110309315A - Generation method, device, computer-readable medium and the electronic equipment of template file - Google Patents

Generation method, device, computer-readable medium and the electronic equipment of template file Download PDF

Info

Publication number
CN110309315A
CN110309315A CN201810367499.8A CN201810367499A CN110309315A CN 110309315 A CN110309315 A CN 110309315A CN 201810367499 A CN201810367499 A CN 201810367499A CN 110309315 A CN110309315 A CN 110309315A
Authority
CN
China
Prior art keywords
entity
entity name
corpus data
default
template file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810367499.8A
Other languages
Chinese (zh)
Other versions
CN110309315B (en
Inventor
周辉阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810367499.8A priority Critical patent/CN110309315B/en
Publication of CN110309315A publication Critical patent/CN110309315A/en
Application granted granted Critical
Publication of CN110309315B publication Critical patent/CN110309315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment provides a kind of generation method of template file, device, computer-readable medium and electronic equipments.The generation method includes: to detect the default entity name for including in corpus data;According to the corresponding relationship between entity name and entity tag, target entity label corresponding with default entity name is determined;By the default entity name for including in the target entity tag replacement corpus data, to generate the template file of corpus data;Wherein, if there are multiple default entity names of character overlap in corpus data, respectively by corresponding entity name in target entity tag replacement corpus data corresponding to multiple default entity name, to generate the multiple template file of corpus data.The problem of technical solution of the embodiment of the present invention can be to avoid generating corresponding template file only for one of entity name and causing template file generation not comprehensive and may generate the template file of inaccuracy in the entity name for character overlap occur.

Description

Generation method, device, computer-readable medium and the electronic equipment of template file
Technical field
The present invention relates to field of computer technology, in particular to a kind of generation method of template file, device, meter Calculation machine readable medium and electronic equipment.
Background technique
In the treatment process of natural language, good template is particularly significant for the corpus in a field, extensive Property and availability can be guaranteed, but how to be inquired in data from the user of magnanimity and extract suitable template file and be One problem, there is no effective solution schemes at present.
It should be noted that information is only used for reinforcing the reason to background of the invention disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The embodiment of the present invention provides generation method, device, computer-readable medium and the electronic equipment of a kind of template file, And then it solves the problems, such as to be unable to get comprehensive template file in the prior art at least to a certain extent.
Other characteristics and advantages of the invention will be apparent from by the following detailed description, or partially by the present invention Practice and acquistion.
According to an aspect of an embodiment of the present invention, a kind of generation method of template file is provided, comprising: detection corpus The default entity name for including in data;It is determining to be preset with described according to the corresponding relationship between entity name and entity tag The corresponding target entity label of entity name;Described in including in corpus data described in the target entity tag replacement Default entity name, to generate the template file of the corpus data;Wherein, if there are character overlaps in the corpus data Multiple default entity names then pass through target entity tag replacement institute predicate corresponding to the multiple default entity name respectively Corresponding entity name in data is expected, to generate the multiple template file of the corpus data.
According to an aspect of an embodiment of the present invention, a kind of generating means of template file are provided, comprising: the first detection Unit, for detecting the default entity name for including in corpus data;Determination unit, for according to entity name and entity tag Between corresponding relationship, determining target entity label corresponding with the default entity name;Generation unit, for passing through The default entity name for including in corpus data described in target entity tag replacement is stated, to generate the mould of the corpus data Plate file;Wherein, the generation unit is also used to multiple default entity names in the corpus data there are character overlap When, respectively by corresponding real in corpus data described in target entity tag replacement corresponding to the multiple default entity name Body title, to generate the multiple template file of the corpus data.
According to an aspect of an embodiment of the present invention, a kind of computer-readable medium is provided, computer is stored thereon with Program realizes the generation method such as above-mentioned template file as described in the examples when the computer program is executed by processor.
According to an aspect of an embodiment of the present invention, a kind of electronic equipment is provided, comprising: one or more processors; Storage device, for storing one or more programs, when one or more of programs are held by one or more of processors When row, so that one or more of processors realize the generation method such as above-mentioned template file as described in the examples.
In the technical solution provided by some embodiments of the present invention, by according between entity name and entity tag Corresponding relationship, determine corresponding with the default entity name for including in corpus data target entity label, and pass through target The default entity name for including in entity tag replacement corpus data, makes it possible to generate corpus by way of Auto-matching The template file of data.And pass through respectively multiple in corpus data there are when multiple default entity names of character overlap Corresponding entity name in the default corresponding target entity tag replacement corpus data of entity name, allows to for different Default entity name can generate corresponding template file, avoid in the default entity name for character overlap occur, only needle Entity name generates corresponding template file is preset to one of those and causing template file to generate comprehensively and may give birth to The problem of at inaccurate template file.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not It can the limitation present invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 is shown can be using the generation method of the template file of the embodiment of the present invention or the generating means of template file Exemplary system architecture schematic diagram;
Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present invention;
Fig. 3 diagrammatically illustrates the flow chart of the generation method of template file according to an embodiment of the invention;
Fig. 4 diagrammatically illustrates the flow chart of the generation method of template file according to another embodiment of the invention;
Fig. 5 diagrammatically illustrates the flow chart of the generation method of template file according to still another embodiment of the invention;
Fig. 6 diagrammatically illustrates a kind of flow chart of the generation method of template file;
Fig. 7 diagrammatically illustrates the flow chart of the generation method of the template file of still another embodiment in accordance with the present invention;
Fig. 8 diagrammatically illustrates the block diagram of the generating means of template file according to an embodiment of the invention;
Fig. 9 diagrammatically illustrates the block diagram of the generating means of template file according to another embodiment of the invention.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the present invention will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However, It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
Fig. 1 is shown can be using the generation method of the template file of the embodiment of the present invention or the generating means of template file Exemplary system architecture 100 schematic diagram.
As shown in Figure 1, system architecture 100 may include one of terminal device 101,102,103 or a variety of, network 104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide communication link Medium.Network 104 may include various connection types, such as wired communications links, wireless communication link etc..
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.For example server 105 can be multiple server compositions Server cluster etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Terminal device 101,102,103 can be the various electronic equipments with display screen, including but not limited to intelligent hand Machine, tablet computer, portable computer and desktop computer etc..
Server 105 can be to provide the server of various services.Such as server 105 collects user and utilizes terminal device Then the corpus data (such as INQUIRE statement) that 103 (being also possible to terminal device 101 or 102) issued detects in corpus data The preset entity name for including, and then include to replace in corpus data according to entity tag corresponding with the entity name Entity name, to generate the template file of corpus data.
In one embodiment of the invention, if there are multiple default entity names of character overlap in corpus data (for example occur " Liu Dehua " of character overlap and " Liu De " in corpus data " film of Liu Dehua ", and " Liu Dehua " and " Liu Moral " is all preset entity name), then server 105 can pass through the corresponding entity mark of this multiple default entity name respectively Corresponding entity name in label replacement corpus data, to generate multiple template file, and then the template file that can ensure to generate It is comprehensive.For example the corresponding entity tag of entity name " Liu Dehua " is " actor ", the corresponding entity of entity name " Liu De " Label is " director ", then two template files: the film of [actor] can be generated;The film of [director] China.
In one embodiment of the invention, after generating multiple template file, server 105 can be from this multiple mould Select a suitable template file as finally determining template file in plate file, such as in the examples described above, server 105 can select template " film of [actor] " as final template file by corresponding selection strategy, and then be conducive to Obtain optimal template file.
It should be noted that the generation method of template file provided by the embodiment of the present invention is generally held by server 105 Row, correspondingly, the generating means of template file are generally positioned in server 105.But in other embodiments of the invention In, terminal can also have similar function with server, thereby executing the life of template file provided by the embodiment of the present invention At scheme.
Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present invention.
It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this hair The function and use scope of bright embodiment bring any restrictions.
As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in Program in memory (ROM) 202 or be loaded into the program in random access storage device (RAM) 203 from storage section 208 and Execute various movements appropriate and processing.In RAM 203, it is also stored with various programs and data needed for system operatio.CPU 201, ROM 202 and RAM 203 is connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to bus 204。
I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 208 including hard disk etc.; And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via such as because The network of spy's net executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media 211, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 210, in order to read from thereon Computer program be mounted into storage section 208 as needed.
Particularly, according to an embodiment of the invention, may be implemented as computer below with reference to the process of flow chart description Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 209, and/or from detachable media 211 are mounted.When the computer program is executed by central processing unit (CPU) 201, executes and limited in the system of the application Various functions.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, wired etc. or above-mentioned any conjunction Suitable combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part realizes that described unit also can be set in the processor.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself.
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment can be real Now such as Fig. 3 to each step shown in Fig. 7.
The realization details of the technical solution of the embodiment of the present invention is described in detail below:
Fig. 3 diagrammatically illustrates the flow chart of the generation method of template file according to an embodiment of the invention, should The generation method of template file is suitable for electronic equipment described in previous embodiment.Referring to shown in Fig. 3, the life of the template file Step S310 is included at least to step S330 at method, is described in detail as follows:
In step s310, the default entity name for including in corpus data is detected.
In one embodiment of the invention, corpus data refers to the nature that user really uses in practical application scene Language data.Entity indicates the basic unit of a concept, and entity name is the word being made of entity.
In step s 320, according to the corresponding relationship between entity name and entity tag, the determining and default entity The corresponding target entity label of title.
In one embodiment of the invention, entity tag is used to identify classification belonging to entity name, for example mark is in fact Body title belongs to " actor " either " director " etc..
In step S330, pass through the default entity for including in corpus data described in the target entity tag replacement Title, to generate the template file of the corpus data;Wherein, if there are the multiple default of character overlap in the corpus data Entity name, then respectively by corpus data described in target entity tag replacement corresponding to the multiple default entity name Corresponding entity name, to generate the multiple template file of the corpus data.
The technical solution of embodiment illustrated in fig. 3 makes it possible to generate the template of corpus data by way of Auto-matching File.And pass through multiple default physical name respectively there are when multiple default entity names of character overlap in corpus data Claim corresponding entity name in corresponding target entity tag replacement corpus data, allows to for different default physical names Title can generate corresponding template file, avoid in the default entity name for character overlap occur, only for therein one A default entity name generates corresponding template file and causes template file to generate not comprehensive and may generate inaccuracy The problem of template file.
It is described in detail below with realization details of two embodiments to the generation method of template file shown in Fig. 3:
In one embodiment of the invention, the default entity name for including in corpus data is detected in step s310, It include: the location information for detecting the default entity name for including in the corpus data in corpus data.
In one embodiment of the invention, default entity name can be determined according to the character for including in corpus data Location information in corpus data.For example corpus data is " film of Liu Dehua and schoolmate ", then entity name " Liu De The location information of China " is [0 | 3], i.e., position 0 is starting position (not including position 0), and position 3 is end position.
In one embodiment of the invention, when detect location information of the default entity name in corpus data it Afterwards, character corresponding with the location information in target entity tag replacement corpus data can be passed through.For example show for above-mentioned Example replaces corpus data " Liu Dehua by " actor " if the corresponding entity tag of entity name " Liu Dehua " is " actor " With the film of schoolmate " in since position 0 ing, the character of the end of position 3.
In one embodiment of the invention, AC automatic machine (Aho-Corasick automaton, Yi Zhongduo can be passed through Mould matching algorithm) it detects and returns to location information of the default entity name for including in corpus data in corpus data, and Target entity label corresponding with default entity name.Such as above-mentioned example, AC automatic machine return the result for [0 | 3 | Actor], that is, illustrate since position 0, the corresponding entity tag of entity name represented by the character that position 3 terminates is "actor".It is wrapped as it can be seen that the technical solution of the embodiment makes it possible to pass through the automatic machine testing of an AC and returns in corpus data Location information of the default entity name contained in corpus data, and target entity mark corresponding with default entity name Label, improve the matching efficiency of algorithm.
In one embodiment of the invention, as described above, default location information of the entity name in corpus data is The character that the default entity name includes is in corpus data according to the position of the first sequence (such as sequence from left to right) arrangement Information.On this basis, if in corpus data including multiple nonoverlapping default entity names, passing through target entity label It, can be according to multiple nonoverlapping default entity name in corpus data when replacing the default entity name in corpus data According to the precedence of the second sequence (second sequence and above-mentioned first sequence are opposite) appearance, successively replaces and multiple do not weigh Folded default entity name.
In this embodiment, for example corpus data is " film of Liu Dehua and schoolmate ".What if AC automatic machine returned The result is that: [0 | 3 | actor], [4 | 7 | actor].If 0-3 in corpus data (wrapped according to sequence from left to right, and not Containing position 0) position changed " actor " into, then " film of [actor] and schoolmate " is obtained, then entity name " Zhang Xue The position of friend " can change, and no longer be that the position of original 4-7 (not including position 4) can if also continuing replacing Obtain the template file of mistake.And the technical solution based on the embodiment of the present invention, it according to entity name " Liu Dehua " and " can open The sequence that schoolmate " occurs from right to left is replaced, i.e., has first changed the position of 4-7 in corpus data into " actor ", then obtained " film of Liu Dehua and [actor] ", has then changed the position of 0-3 in corpus data into " actor " again, obtains " [actor] The film of [actor] ".As it can be seen that multiple nonoverlapping entities can occur in corpus data in the technical solution of embodiment one When title, guarantee that the replacement of entity tag does not go wrong, and then accurate template file can be obtained.
In another embodiment of the present invention, the default physical name for including in corpus data is detected in step s310 Claim, comprising: the character content for the default entity name for including in detection corpus data.
In one embodiment of the invention, due to directly detecting the word of the default entity name for including in corpus data Content is accorded with, therefore the character content in target entity tag replacement corpus data can be passed through.For example corpus data is " Liu De The film of China and schoolmate ", the character content of the default entity name detected are " Liu Dehua " and " Zhang Xueyou ", and default real Body title " Liu Dehua " and " Zhang Xueyou " corresponding entity tag are all " actor ", then " actor " can directly be used to replace language Expect " Liu Dehua " and " Zhang Xueyou " in data, obtains " film of [actor] and [actor] ".As it can be seen that the technology of embodiment two When multiple nonoverlapping entity names can also occur in corpus data in scheme, guarantee that the replacement of entity tag is not asked Topic, and then accurate template file can be obtained.
In one embodiment of the invention, for the technical solution of embodiment two, the first automatic machine examination of AC can be passed through The character content for the default entity name for including in corpus data is surveyed and returns, and determining and default real by the 2nd AC automatic machine The corresponding target entity label of body title, to guarantee to obtain accurate template file by the direct replacement of character content.
In one embodiment of the invention, as shown in figure 4, template file according to another embodiment of the invention Generation method, on the basis of step S310 and step S320 shown in Fig. 3, further includes:
Step S410 judges multiple default if there are multiple default entity names of character overlap in corpus data Whether the corresponding entity tag of entity name is identical, if so, thening follow the steps S420;Otherwise, step S430 is executed.
Step S420 replaces the multiple default entity name by entity tag corresponding to multiple default entity names The most entity name of middle character quantity, to generate the template file of the corpus data.
There is character overlap " Liu Dehua " and " Liu De ", " Liu Dehua " and " Liu De " is all preset entity name, and entity name " Liu Dehua " and " Liu The corresponding entity tag of moral " is all " actor ", then can replace " Liu Dehua " in corpus data by " actor ".
Step S430 passes through corpus number described in the corresponding target entity tag replacement of the multiple default entity name respectively The corresponding entity name in, to generate the multiple template file of corpus data.
There is character overlap " Liu Dehua " and " Liu De ", " Liu Dehua " and " Liu De " is all preset entity name, and entity name " Liu Dehua " is corresponding Entity tag is " actor ", and the corresponding entity tag of entity name " Liu De " is " director ", then two templates can be generated File: the film of [actor];The film of [director] China.
The technical solution of embodiment illustrated in fig. 4 makes multiple default physical names in corpus data there are character overlap Claim, and the corresponding entity tag phase of multiple default entity name meanwhile, it is capable to the entity name for selecting character quantity most into Row replacement, to ensure to obtain accurate template file;And in the corresponding entity tag of multiple default entity name not phase Meanwhile can be replaced respectively, to generate multiple template file, and then can guarantee to obtain comprehensive template file.
Technical solution based on previous embodiment, as shown in figure 5, template file according to still another embodiment of the invention Generation method, further include following steps:
Whether step S510 detects comprising any one in default entity name in the multiple template file of generation, or Person whether include any two entity name in default entity name not lap.
In one embodiment of the invention, real for such as corpus data " film of Liu Dehua and schoolmate " The corresponding entity tag of body title " Liu Dehua " is " actor ";The corresponding entity tag of entity name " Liu De " is "director";The corresponding entity tag of entity name " Zhang Xueyou " is " actor ";The corresponding entity mark of entity name " Zhang Xue " Label are " director ".So the technical solution of above-described embodiment is likely to be obtained following template file through the invention: [actor] With the film of schoolmate;The film of [actor] and [actor];The film of [actor] and [director] friend;Liu Dehua and The film of [actor];The film of Liu Dehua and [director] friend;The film of [director] China and schoolmate; The film of [director] China and [actor];The film of [director] China and [director] friend.Segment template text therein Entity name " Liu Dehua " is contained in part perhaps entity name " Zhang Xueyou " or contains " Liu Dehua " and " Liu De " no Lap " China ", or contain the not lap " friend " of " Zhang Xueyou " Yu " Zhang Xue ".
Step S520, if it detects comprising any one in default entity name in any template file, or comprising The not lap of any two entity name in default entity name is then deleted from the multiple template file described Any template file, to be filtered to the multiple template file.
In one embodiment of the invention, such as above-mentioned example, since " Liu Dehua " and " Zhang Xueyou " is all physical name Claim, therefore includes the template file clearly inaccuracy of the two entity names, and for the mould comprising " China " and " friend " Plate file, it is clear that be directly to be replaced to " Liu De " and " Zhang Xue ", without consider possibility higher " Liu Dehua " and " Zhang Xueyou ", it is therefore desirable to delete this segment template file, and then the template file that can ensure more to be prepared.
It is described in detail below in conjunction with realization details of the Fig. 6 and Fig. 7 to the technical solution of the embodiment of the present invention.
As shown in fig. 6, including the following steps: in a kind of generation method of template file
Step S601 carries out character traversal to corpus data.
Step S602 judges the entity name for starting whether to be matched in database with current character, if so, holding Row step S603;Otherwise, step S604 is executed.
Entity name is replaced with entity tag by step S603.
Step S604 judges whether to traverse ending, if so, determination has obtained template file;Otherwise, return step S601 continues to traverse.
Technical solution shown in fig. 6 is that the violence matching of character string one by one is carried out to corpus data, and matching has suffered just will be real Body title replaces with entity tag, and there are the following problems for this scheme:
1, such as corpus data is " whom the wife of Liu Dehua is ", if exist simultaneously entity name " Liu De " and " Liu De China ", and the corresponding entity tag of the two entity names is all singer, then the template obtained according to scheme shown in fig. 6 For " whom the wife of [singer] China is ".
2, assume that entity name " Liu De " corresponding entity tag is actor, the corresponding entity of entity name " Liu Dehua " Label is singer, then the template obtained according to scheme shown in fig. 6 are as follows: " whom the wife of [actor] China is ", it is basic to generate Not useful template " whom the wife of [singer] is ", i.e., there is the case where being overlapped for entity name, can only obtain one Template is unable to get the fully intermeshing combination of all possible templates.
In view of the above-mentioned problems, the embodiment provides following solution, it is specific as shown in fig. 7, comprises step Rapid S701 and step S702, is illustrated in detailed below:
In step s 701, AC automatic machine carries out multimode matching.
In one embodiment of the invention, step S701 may include following process in specific implementation: AC automatic machine Foundation, occur by multimode matching algorithm various solution to the problem, a variety of physical address conflicts fully intermeshing.
1, the foundation of AC automatic machine
In one embodiment of the invention, in order to improve data computational efficiency, can choose spark, (a kind of calculating is drawn Hold up) handle the data of magnanimity, and AC automatic machine can be established in spark, to be loaded into relevant entity name and entity Label, such as the name of the performer of default magnanimity, director are referred to as entity name, and the mark of actor and director is respectively set Label.
The return of AC automatic machine is usually " start | end | label ", respectively indicates the beginning of the entity name in matching Position, end position and entity tag.If being matched to multiple entity names in a corpus data, and first it is substituted first A entity name being matched to then the position of subsequent entity name just changes, and then will lead to the return of AC automatic machine Absolute position values lose meaning.Such as corpus data " film of Liu Dehua and schoolmate ", if AC automatic machine returns It is returning the result is that: [0 | 3 | actor], [4 | 7 | actor].If by 0-3 in corpus data (according to sequence from left to right, and Not comprising position 0) position changed " actor " into, then obtain " film of [actor] and schoolmate ", then entity name " The position of schoolmate " can change, and no longer be the position of original 4-7 (not including position 4), if also continuing replacing, It can obtain the template file of mistake.In order to solve this problem, in one embodiment of the invention, it can establish two AC certainly Motivation, one for returning to the entity name in matching, another returns to entity tag, and mark is finally replaced according to entity name Label, thus smoothly solve the replacement problem of entity name and entity tag.
In another embodiment of the present invention, can go out from right to left according to entity name " Liu Dehua " and " Zhang Xueyou " Existing sequence is replaced, i.e., has first changed the position of 4-7 in corpus data into " actor ", then obtain " Liu Dehua and The film of [actor] " has then changed the position of 0-3 in corpus data into " actor " again, obtains " [actor] and [actor] Film ", also can solve the replacement problem of entity name and entity tag in this way.
2, the various solution to the problem occurred by multimode matching algorithm
It in an embodiment of the present invention, include: the address of similar entity by the various problems that multimode matching algorithm occurs The address conflict issues of collision problem and inhomogeneity entity, are described below respectively:
2.1, the address conflict issues of similar entity
In one embodiment of the invention, by taking corpus data " film of Liu Dehua and schoolmate " as an example, it is assumed that entity Title " Liu Dehua " and " Liu De " corresponding entity tag are actor, then the return of AC automatic machine is exactly: [0 | 2 | actor], [0|3|actor].By the processing analysis to a large amount of sentences it can be found that for same type of address conflict issues, usually select It is a kind of reasonable selection that it is longer, which to select address, for example is exactly one by entity tag replacement " Liu Dehua " for the example Very reasonable selection.Therefore available following conclusion: similar physical address selects long entity to be replaced when conflicting.
2.2, the address conflict issues of inhomogeneity entity
In one embodiment of the invention, by taking corpus data " film of Liu Dehua and schoolmate " as an example, if entity The corresponding entity tag of title " Liu Dehua " is actor, the corresponding entity tag of entity name " Liu De " is director, then The return of AC automatic machine is exactly: [0 | 2 | director], [0 | 3 | actor].It is corresponding although the address conflict of entity name Entity tag it is inconsistent, it can be considered that the two templates are equiprobable probability, and then two kinds of templates can be generated: The film of [actor] and schoolmate, the film of [director] China and schoolmate.
3, the fully intermeshing of a variety of physical address conflicts
In one embodiment of the invention, by taking corpus data " film of Liu Dehua and schoolmate " as an example, if entity The corresponding entity tag of title " Liu Dehua " is " actor ";The corresponding entity tag of entity name " Liu De " is " director "; The corresponding entity tag of entity name " Zhang Xueyou " is " actor ";The corresponding entity tag of entity name " Zhang Xue " is "director".So the technical solution of above-described embodiment is likely to be obtained following template file through the invention: [actor] and opening The film of schoolmate;The film of [actor] and [actor];The film of [actor] and [director] friend;Liu Dehua and [actor] Film;The film of Liu Dehua and [director] friend;The film of [director] China and schoolmate;[director] China and The film of [actor];The film of [director] China and [director] friend.
In step S702, stencil-chosen is carried out.
The available multiple template file of the technical solution of above-described embodiment through the invention, but all do not sound feasible very much The template on border in one embodiment of the invention, should if contained in candidate template in order to pick out suitable template file Set that the related entities title or entity name in field make the difference (for example it is exactly " China " that " Liu Dehua " and " Liu De ", which makes the difference, " is opened It is exactly " friend " that schoolmate " and " Zhang Xue ", which make the difference), then filter out the template.Such as in template above cannot containing " Liu Dehua ", " Zhang Xueyou ", " Liu De ", " Zhang Xue ", " China ", " friend ", therefore pass through obtained template file after filtering are as follows: [actor] and The film of [actor].
The technical solution of the above embodiment of the present invention not only solves the provider location collision problem of same entity tag, and And solves the provider location collision problem of different entities label, while solving the problems, such as all possible fully intermeshing.Meanwhile this The technical solution of invention above-described embodiment excavates the template file in newly-built field in which can be convenient, and is able to ascend to old neck The unlapped semantic support in domain and template are excavated, and more corpus in some field can be additionally recalled, and increase field corpus It is rich.
The device of the invention embodiment introduced below, can be used for executing the template file in the above embodiment of the present invention Generation method.For undisclosed details in apparatus of the present invention embodiment, the generation of the above-mentioned template file of the present invention is please referred to The embodiment of method.
Fig. 8 diagrammatically illustrates the block diagram of the generating means of template file according to an embodiment of the invention.
Referring to shown in Fig. 8, the generating means 800 of template file according to an embodiment of the invention, comprising: the first inspection Survey unit 801, determination unit 802 and generation unit 803.
Wherein, first detection unit 801 is for detecting the default entity name for including in corpus data;Determination unit 802 For according to the corresponding relationship between entity name and entity tag, determining target reality corresponding with the default entity name Body label;Generation unit 803 is used for the default reality by including in corpus data described in the target entity tag replacement Body title, to generate the template file of the corpus data;Wherein, the generation unit 803 is also used in the corpus data It is middle there are when multiple default entity names of character overlap, it is real to pass through target corresponding to the multiple default entity name respectively Corresponding entity name in corpus data described in body tag replacement, to generate the multiple template file of the corpus data.
Referring to shown in Fig. 9, the generating means 900 of template file according to another embodiment of the invention, with Fig. 8 Shown on the basis of first detection unit 801, determination unit 802 and generation unit 803, further includes: second detection unit 901 With deletion unit 902.
Wherein, second detection unit 901 is used to generate the multiple template text of the corpus data in the generation unit 803 After part, detect whether comprising any one in the default entity name in the multiple template file, or whether wrap Not lap containing any two entity name in the default entity name;
Unit 902 is deleted to be used to detect in any template file in the second detection unit 901 comprising described default Any one in entity name, or the not lap comprising any two entity name in the default entity name When, any template file is deleted, from the multiple template file to be filtered to the multiple template file.
In one embodiment of the invention, the generating means of Fig. 8 and template file shown in Fig. 9 can also include: and sentence Disconnected unit.The judging unit is for judging whether the corresponding entity tag of the multiple default entity name is identical;The generation Unit 803 is used for when the corresponding entity tag of the multiple default entity name is not identical, respectively by the multiple default Corresponding entity name in corpus data described in target entity tag replacement corresponding to entity name.
In one embodiment of the invention, aforementioned schemes are based on, the generation unit 803 is also used to: the multiple When the default corresponding entity tag of entity name is identical, passes through the corresponding entity tag of the multiple default entity name and replace institute The entity name that character quantity is most in multiple default entity names is stated, to generate the template file of the corpus data.
In one embodiment of the invention, aforementioned schemes are based on, first detection unit 801 is used for: detecting the corpus Location information of the default entity name for including in data in the corpus data.
In one embodiment of the invention, aforementioned schemes are based on, generation unit 803 is used for: by the target entity Character corresponding with the location information in corpus data described in tag replacement.
In one embodiment of the invention, aforementioned schemes are based on, the location information is the default entity name packet The character contained is in the corpus data according to the first tactic location information;The generation unit 803 is used for: described When in corpus data including multiple nonoverlapping default entity names, according to the multiple nonoverlapping default entity name in institute The precedence occurred in corpus data according to the second sequence is stated, the multiple nonoverlapping default entity name is successively replaced, Wherein, first sequence is opposite with second sequence.
In one embodiment of the invention, aforementioned schemes are based on, passes through the automatic machine testing of AC and returns to the corpus number Location information of the default entity name for including in the corpus data, and it is corresponding with the default entity name Target entity label.
In one embodiment of the invention, aforementioned schemes are based on, first detection unit 801 is used for: detecting the corpus The character content for the default entity name for including in data.
In one embodiment of the invention, aforementioned schemes are based on, generation unit 803 is used for: by the target entity The character content in corpus data described in tag replacement.
In one embodiment of the invention, aforementioned schemes are based on, by the automatic machine testing of the first AC and return to institute's predicate The character content for the default entity name for including in material data, and pass through the determination of the 2nd AC automatic machine and the default entity name Corresponding target entity label.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, embodiment according to the present invention, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the present invention The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, touch control terminal or network equipment etc.) executes embodiment according to the present invention Method.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (14)

1. a kind of generation method of template file characterized by comprising
The default entity name for including in detection corpus data;
According to the corresponding relationship between entity name and entity tag, determining target reality corresponding with the default entity name Body label;
By the default entity name for including in corpus data described in the target entity tag replacement, the predicate to generate Expect the template file of data;
Wherein, if there are multiple default entity names of character overlap in the corpus data, respectively by the multiple pre- If corresponding entity name in corpus data described in target entity tag replacement corresponding to entity name, to generate the corpus The multiple template file of data.
2. the generation method of template file according to claim 1, which is characterized in that generating the more of the corpus data After a template file, further includes:
It detects whether comprising any one in the default entity name in the multiple template file, or whether includes institute State the not lap of any two entity name in default entity name;
If detecting comprising any one in the default entity name in any template file, or include the default reality The not lap of any two entity name in body title then deletes any mould from the multiple template file Plate file, to be filtered to the multiple template file.
3. the generation method of template file according to claim 1, which is characterized in that respectively by the multiple default In corpus data described in the corresponding target entity tag replacement of entity name before corresponding entity name, further includes:
Judge whether the corresponding entity tag of the multiple default entity name is identical;
If the corresponding entity tag of the multiple default entity name is not identical, trigger respectively through the multiple default entity In corpus data described in target entity tag replacement corresponding to title the step of corresponding entity name.
4. the generation method of template file according to claim 3, which is characterized in that further include:
If the corresponding entity tag of the multiple default entity name is identical, corresponding by the multiple default entity name Entity tag replaces the entity name that character quantity is most in the multiple default entity name, to generate the corpus data Template file.
5. the generation method of template file according to claim 1, which is characterized in that include in detection corpus data is pre- If entity name, comprising:
Detect location information of the default entity name for including in the corpus data in the corpus data.
6. the generation method of template file according to claim 5, which is characterized in that replaced by the target entity label Change the default entity name for including in the corpus data, comprising:
Pass through character corresponding with the location information in corpus data described in the target entity tag replacement.
7. the generation method of template file according to claim 5, which is characterized in that the location information is described default The character that entity name includes is in the corpus data according to the first tactic location information;
If in the corpus data including multiple nonoverlapping default entity names, pass through the target entity tag replacement institute State the default entity name for including in corpus data, comprising: according to the multiple nonoverlapping default entity name in institute The precedence occurred in corpus data according to the second sequence is stated, the multiple nonoverlapping default entity name is successively replaced, Wherein, first sequence is opposite with second sequence.
8. the generation method of template file according to claim 5, which is characterized in that by the automatic machine testing of AC and return Location information of the default entity name for including in the corpus data in the corpus data, and with the default entity The corresponding target entity label of title.
9. the generation method of template file according to claim 1, which is characterized in that include in detection corpus data is pre- If entity name, comprising:
Detect the character content for the default entity name for including in the corpus data.
10. the generation method of template file according to claim 9, which is characterized in that pass through the target entity label Replace the default entity name for including in the corpus data, comprising:
Pass through the character content in corpus data described in the target entity tag replacement.
11. the generation method of template file according to claim 9, which is characterized in that pass through the first automatic machine testing of AC And return to the character content for the default entity name for including in the corpus data, and by the 2nd AC automatic machine it is determining with it is described The default corresponding target entity label of entity name.
12. a kind of generating means of template file characterized by comprising
First detection unit, for detecting the default entity name for including in corpus data;
Determination unit, for according to the corresponding relationship between entity name and entity tag, the determining and default entity name Corresponding target entity label;
Generation unit, for the default physical name by including in corpus data described in the target entity tag replacement Claim, to generate the template file of the corpus data;
Wherein, the generation unit is also used in the corpus data there are when multiple default entity names of character overlap, Pass through corresponding entity in corpus data described in target entity tag replacement corresponding to the multiple default entity name respectively Title, to generate the multiple template file of the corpus data.
13. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the computer program is located Manage the generation method that the template file as described in any one of claims 1 to 11 is realized when device executes.
14. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize the template file as described in any one of claims 1 to 11 Generation method.
CN201810367499.8A 2018-04-23 2018-04-23 Template file generation method and device, computer readable medium and electronic equipment Active CN110309315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810367499.8A CN110309315B (en) 2018-04-23 2018-04-23 Template file generation method and device, computer readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810367499.8A CN110309315B (en) 2018-04-23 2018-04-23 Template file generation method and device, computer readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110309315A true CN110309315A (en) 2019-10-08
CN110309315B CN110309315B (en) 2024-02-02

Family

ID=68073888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810367499.8A Active CN110309315B (en) 2018-04-23 2018-04-23 Template file generation method and device, computer readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110309315B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046667A (en) * 2019-11-14 2020-04-21 深圳市优必选科技股份有限公司 Sentence recognition method, sentence recognition device and intelligent equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317839A (en) * 2014-10-10 2015-01-28 北京国双科技有限公司 Method and device for generating report form template
CN106910501A (en) * 2017-02-27 2017-06-30 腾讯科技(深圳)有限公司 Text entities extracting method and device
CN107577655A (en) * 2016-07-05 2018-01-12 北京国双科技有限公司 Name acquiring method and apparatus
CN107608960A (en) * 2017-09-08 2018-01-19 北京奇艺世纪科技有限公司 A kind of method and apparatus for naming entity link

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317839A (en) * 2014-10-10 2015-01-28 北京国双科技有限公司 Method and device for generating report form template
CN107577655A (en) * 2016-07-05 2018-01-12 北京国双科技有限公司 Name acquiring method and apparatus
CN106910501A (en) * 2017-02-27 2017-06-30 腾讯科技(深圳)有限公司 Text entities extracting method and device
CN107608960A (en) * 2017-09-08 2018-01-19 北京奇艺世纪科技有限公司 A kind of method and apparatus for naming entity link

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜鹏等: "一种基于云平台的防汛文档智能生成模型构建", 《水利信息化》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046667A (en) * 2019-11-14 2020-04-21 深圳市优必选科技股份有限公司 Sentence recognition method, sentence recognition device and intelligent equipment
CN111046667B (en) * 2019-11-14 2024-02-06 深圳市优必选科技股份有限公司 Statement identification method, statement identification device and intelligent equipment

Also Published As

Publication number Publication date
CN110309315B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN109446099A (en) Automatic test cases generation method, device, medium and electronic equipment
CN104572067B (en) For storing the method and system of Snipping Tool
US9235636B2 (en) Presenting data in response to an incomplete query
KR20180008480A (en) System and method for extracting and sharing application-related user data
CN107787491A (en) Document for reusing the content in document stores
CN104462056B (en) For the method and information handling systems of knouledge-based information to be presented
CN103049271A (en) Method and device for automatically generating description document of API (application program interface)
US11556698B2 (en) Augmenting textual explanations with complete discourse trees
CN107391475A (en) Label information management method and electronic equipment based on e-book
CN107644286A (en) Workflow processing method and device
CN108256070A (en) For generating the method and apparatus of information
CN109359194A (en) Method and apparatus for predictive information classification
CN107679051B (en) Transaction system error-detecting method and device
CN108965389A (en) Method for showing information
CN109271403A (en) A kind of operating method of data query, device, medium and electronic equipment
CN109739600A (en) Data processing method, medium, device and calculating equipment
CN109271603A (en) Method and apparatus for displayed page
CN109902255A (en) Page mixing browsing record generation method, device, equipment and storage medium
CN109871317A (en) Code quality analysis method and device, storage medium and electronic equipment
CN109739526A (en) Code update method and device
CN109726380A (en) Table edit method and device
JP2023536831A (en) Interactive interface for data analysis and report generation
CN110032616A (en) A kind of acquisition method and device of document reading conditions
CN110119386A (en) Data processing method, data processing equipment, medium and calculating equipment
CN110471941A (en) It is automatically positioned the method, apparatus and electronic equipment of judgment basis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant