CN109597888A

CN109597888A - Establish the method, apparatus of text field identification model

Info

Publication number: CN109597888A
Application number: CN201811376081.XA
Authority: CN
Inventors: 梁川; 梁一川; 凌光; 林英展; 徐威
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2019-04-09

Abstract

The present invention provides a kind of method, apparatus for establishing text field identification model, which comprises obtains the text for not carrying out domain classification；Field belonging to the text is marked using domain classification template；Using each text as input, using the field of each text marking as output, train classification models obtain text field identification model；Wherein, field belonging to text that user is inputted can be identified using the text field identification model.The present invention is able to solve the prior art and overfitting problem caused by template classification or category of model is used alone, and then promotes the accuracy that text field identification model identifies text fields.

Description

Establish the method, apparatus of text field identification model

[technical field]

The present invention relates to natural language processing technique field more particularly to a kind of sides for establishing text field identification model Method, device, equipment and computer storage medium.

[background technique]

In some Domestic News systems, after obtaining the query text that user is inputted, need from recommendation, question and answer, chat Correct field module is selected to issue the inquiry request of user in equal fields module, therefore a urgent problem needed to be solved is exactly How the field of the query text of user input is identified.

The prior art, generally can be using the side that template or learning model is used alone when identifying field belonging to text Formula, and limitation below can be had by being used alone when two kinds of mode classifications identify field belonging to texts.Wherein, individually make When identifying text field with learning model, most important disadvantage is: if sufficient labeled data can not be obtained, will lead to training There are more serious overfitting problems for obtained disaggregated model, so that text fields can not be identified accurately.And independent When identifying text field using template, most important disadvantage is: if wanting to realize accurately identifying for text fields, needing By a large amount of classification model of human configuration, thus manpower expend it is huge, if the negligible amounts of classification model, can equally exist compared with Serious overfitting problem.

[summary of the invention]

In view of this, the present invention provides a kind of method, apparatus for establishing text field identification model, equipment and computers Storage medium is used alone overfitting problem caused by template classification or category of model for solving the prior art, promotes text Recognition accuracy of this field identification model to text fields.

The present invention in order to solve the technical problem used by technical solution be to provide a kind of text field identification model established Method, which comprises obtain the text for not carrying out domain classification；It is marked belonging to the text using domain classification template Field；Using each text as input, using the field of each text marking as output, train classification models obtain text field knowledge Other model；Wherein, field belonging to the inputted text of user can be identified using the text field identification model.

According to one preferred embodiment of the present invention, the domain classification template obtains in the following manner: obtaining each field Common text；Word cutting is carried out to the common text, to obtain the semanteme of each word in the common text；According to described common Each word is semantic extensive to common text progress in text；Using the extensive result of the common text as described common The domain classification template of text fields.

According to one preferred embodiment of the present invention, described to mark the packet of field belonging to the text using domain classification template It includes: word cutting being carried out to the text, to obtain the semanteme of each word in the text；According to the semanteme of word each in the text It is extensive to text progress, to obtain the extensive result of the text；Judge whether the extensive result of the text hits institute State domain classification template；If the extensive result of the text hits the domain classification template, the domain classification that will be hit The corresponding field of template is labeled as field belonging to the text；If domain classification described in the extensive result miss of the text Field belonging to the text is then labeled as default field by template.

According to one preferred embodiment of the present invention, whether the extensive result for judging the text hits the domain classification Template includes: the text similarity calculated between the extensive result and the domain classification template of the text；If calculating gained The text similarity arrived is greater than preset threshold, it is determined that the extensive result of the text hits the domain classification template, otherwise Determine miss.

According to one preferred embodiment of the present invention, after marking field belonging to the text using domain classification template, Further include: using the extensive result of the text as the domain classification template of the text fields.

The present invention in order to solve the technical problem used by technical solution be to provide a kind of text field identification model established Device, described device include: acquiring unit, for obtaining the text for not carrying out domain classification；Unit is marked, for utilizing field Classification model marks field belonging to the text；Training unit is used for using each text as input, by the neck of each text marking Domain obtains text field identification model as output, train classification models.

According to one preferred embodiment of the present invention, the mark unit obtains the domain classification template in the following manner: Obtain the common text in each field；Word cutting is carried out to the common text, to obtain the semanteme of each word in the common text； According to the semantic extensive to common text progress of word each in the common text；By the extensive result of the common text Domain classification template as the common text fields.

According to one preferred embodiment of the present invention, the mark unit is marked belonging to the text using domain classification template Field when, it is specific to execute: word cutting to be carried out to the text, to obtain the semanteme of each word in the text；According to the text Each word is semantic extensive to text progress in this, to obtain the extensive result of the text；Judge the general of the text Change whether result hits the domain classification template；It, will if the extensive result of the text hits the domain classification template The corresponding field of domain classification template hit is labeled as field belonging to the text；If the extensive result of the text is not The domain classification template is hit, then field belonging to the text is labeled as default field.

According to one preferred embodiment of the present invention, whether the mark unit hits institute in the extensive result for judging the text It is specific to execute: to calculate the text between the extensive result of the text and the domain classification template when stating domain classification template Similarity；If calculating obtained text similarity greater than preset threshold, it is determined that described in the extensive result hit of the text Otherwise domain classification template determines miss.

According to one preferred embodiment of the present invention, the mark unit is marked belonging to the text using domain classification template Field after, also execute: using the extensive result of the text as the domain classification template of the text fields.

As can be seen from the above technical solutions, the present invention obtains text by way of fusion template classification and category of model This field identification model can alleviate the limitation of existing text field identification method, be effectively prevented from exclusive use classification Template or disaggregated model carry out overfitting problem existing when text field identification, to reach better recognition effect.

[Detailed description of the invention]

Fig. 1 is the method flow diagram for establishing text field identification model that one embodiment of the invention provides；

Fig. 2 is the structure drawing of device for establishing text field identification model that one embodiment of the invention provides；

Fig. 3 is the block diagram for the computer system/server that one embodiment of the invention provides.

[specific embodiment]

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning.

It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".

Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement Or event) when " or " in response to detection (condition or event of statement) ".

Fig. 1 is a kind of method flow diagram for establishing text field identification model that one embodiment of the invention provides, as shown in figure 1 It is shown, which comprises

In 101, the text for not carrying out domain classification is obtained.

In this step, the text for not carrying out domain classification is obtained, that is, obtains the text for not marking fields.

It is understood that can be obtained from internet by way of data mining and not carry out domain classification largely Text, such as the search query word that is inputted of user is excavated from web search log；It can also be by the side that artificially collects Formula obtains the text for not carrying out domain classification largely.The present invention to acquisition do not carry out domain classification text mode without It limits.

In 102, field belonging to the text is marked using domain classification template.

In this step, using domain classification template to the text institute for not carrying out domain classification acquired in step 101 The field of category is labeled, so that it is determined that field belonging to each text.

Wherein, this step can obtain in the following ways domain classification template: obtaining the common text in each field, that is, obtain Take text common, representative in each field；Word cutting is carried out to common text, to obtain each word in common text It is semantic；According to the semantic extensive to common text progress of word each in common text, using the extensive result of common text as this The domain classification template of common text fields.In addition, domain classification template is also possible to pre-existing, directly acquire pre- The domain classification template pre-existed carries out the mark to text fields.

It is understood that the frequency of occurrences in each field can be higher than to the text of predeterminated frequency as the common of the field Text；Common text of the N text as the field can also be selected at random from the text in each field, wherein N be greater than etc. In 1 positive integer；The common text in each field can also be obtained by way of artificially collecting.The present invention is to each field of acquisition The mode of common text is without limiting.

Wherein, for domain classification template for classifying to field belonging to text, a domain classification template is corresponding only A field one by one, but can have multiple and different domain classification templates in a field.

Specifically, this step, can be in the following ways when using field belonging to domain classification template mark text: Word cutting is carried out to text, to obtain the semanteme of each word in text；It is extensive to text progress according to the semanteme of word each in text, To obtain the extensive result of text；Judge whether the extensive result of text hits domain classification template；If the extensive result of text Domain classification template is hit, then the corresponding field of domain classification template hit is labeled as field belonging to text, if literary Field belonging to text is then labeled as default field by this extensive result miss domain classification template.Wherein, field is preset It can be the field that user is separately provided, or any one field in each field.

It is understood that can be used following when whether the extensive result for judging text hits domain classification template Mode: the text similarity between the extensive result of text and domain classification template is calculated；Judgement calculates obtained text phase Whether it is greater than preset threshold like degree, if calculating obtained text similarity greater than preset threshold, it is determined that the extensive knot of text Fruit hits domain classification template, otherwise the extensive result miss domain classification template of text.

In addition, this step can also include in following after using field belonging to domain classification template mark text Hold: using the extensive result of text as the domain classification template of text fields.I.e. after the field belonging to mark text, Using the extensive result of text as the domain classification template in corresponding field, then go to be labeled field belonging to other texts, It is recycled with this.

That is, this step complete to the mark of text fields after, using the extensive result of text be used as this The domain classification template of text fields, so that the domain classification template being constantly increasing in each field, is further promoted each The classification capacity of domain classification template in field, to be labeled to field belonging to text more accurately.

It is understood that this step is other than it can obtain largely having marked the text in field, additionally it is possible to obtain big Amount can identify the domain classification template of text fields.Therefore the present invention can also be merely with obtained whole fields point Class template just can be realized the purpose for the text fields that identification user is inputted, without in the instruction for carrying out disaggregated model After white silk, recycles and obtained text field identification model is trained to obtain field belonging to text.

This step is in such a way that domain classification template is labeled text fields, without by manually carrying out text The mark in field just can obtain largely having marked the texts of its fields, and then utilize the obtained field that marked Text training obtains text field identification model.

In 103, using each text as input, using the field of each text marking as output, train classification models, to obtain To text field identification model.

It in this step, will using field belonging to each text that each text and step 102 are marked as training sample Each text is as input, using the field of each text marking as output, train classification models, to obtain text field identification mould Type.By the obtained text field identification model of training, it just can be realized and text institute obtained according to the text of user's input The purpose in the field of category.

Wherein, disaggregated model can for support vector machines, neural network model, deep learning model etc., the present invention to point The type of class model is without limiting.

Since step 102 can obtain a large amount of text for having marked field, this step is according to enough mark numbers According to the text field identification model that training obtains, field belonging to text can be more accurately identified.

The mode of fusion template classification and category of model provided by through the invention is realized to text fields Identification can obtain a large amount of texts for marking fields according to a small amount of domain classification template at training initial stage, and can be correspondingly A large amount of domain classification template is obtained, is no longer needed to by a large amount of domain classification template of human configuration, to reduce manpower loss；And Phase after training can be trained disaggregated model according to the sufficient text for having marked field, existing so as to alleviate Limitation of the mode classification within the different trained periods is effectively prevented from and classification model or disaggregated model is used alone in progress text This field overfitting problem existing when identifying, to make obtained text field identification model that there is preferably identification effect Fruit.

Fig. 2 is a kind of structure drawing of device for establishing text field identification model that one embodiment of the invention provides, in Fig. 2 Shown, described device includes: acquiring unit 21, mark unit 22 and training unit 23.

Acquiring unit 21, for obtaining the text for not carrying out domain classification.

Acquiring unit 21 obtains the text for not carrying out domain classification, that is, obtains the text for not marking fields.

It is understood that acquiring unit 21 can by way of data mining, obtained from internet it is a large amount of not into The text of row domain classification, such as the search query word that excavation user is inputted from web search log；Acquiring unit 21 The text for not carrying out domain classification largely can be obtained by way of artificially collecting.The present invention does not carry out field point to acquisition The mode of the text of class is without limiting.

Unit 22 is marked, for marking field belonging to the text using domain classification template.

Unit 22 is marked using domain classification template to the text for not carrying out domain classification acquired in acquiring unit 21 Affiliated field is labeled, so that it is determined that field belonging to each text.

Wherein, mark unit 22 can obtain in the following ways domain classification template: the common text in each field is obtained, Obtain text common, representative in each field；Word cutting is carried out to common text, to obtain each word in common text The semanteme of language；According to the semantic extensive to common text progress of word each in common text, the extensive result of common text is made For the domain classification template of the common text fields.

It is understood that the frequency of occurrences in each field can be higher than the text of predeterminated frequency as this by mark unit 22 The common text in field；Mark unit 22 can also select N text as the normal of the field at random from the text in each field With text, wherein N is the positive integer more than or equal to 1；Mark unit 22 can also obtain each field by way of artificially collecting Common text.The present invention is to the mode for the common text for obtaining each field without limiting.

Specifically, mark unit 22 can use following when using field belonging to domain classification template mark text Mode: word cutting is carried out to text, to obtain the semanteme of each word in text；Semantic according to word each in text carries out text It is extensive, to obtain the extensive result of text；Judge whether the extensive result of text hits domain classification template；If text is extensive As a result domain classification template is hit, then the corresponding field of domain classification template hit is labeled as field belonging to text, If the extensive result miss domain classification template of text, field belonging to text is labeled as default field.Wherein, it presets Field can be the field that user is separately provided, or any one field in each field.

It is understood that unit 22 is marked when whether the extensive result for judging text hits domain classification template, it can With in the following ways: calculating the text similarity between the extensive result of text and domain classification template；Judgement calculates gained To text similarity whether be greater than preset threshold, if calculating obtained text similarity greater than preset threshold, it is determined that text This extensive result hits domain classification template, otherwise the extensive result miss domain classification template of text.

In addition, mark unit 22 using after field, can also include belonging to domain classification template mark text with Lower content: using the extensive result of text as the domain classification template of text fields.That is the field belonging to mark text Later, mark unit 22 is using the extensive result of the text as the domain classification template in corresponding field, then goes to other text institutes The field of category is labeled, and is recycled with this, to realize without just can manually obtain a large amount of domain classification template.

That is, after completing to the mark of text fields, the extensive result of text is made for mark unit 22 For the domain classification template of text fields, so that the domain classification template being constantly increasing in each field, further mentions The classification capacity for rising domain classification template in each field, to be labeled to field belonging to text more accurately.

It is understood that mark unit 22 is other than it can obtain largely having marked the text in field, additionally it is possible to To the domain classification template that can largely identify text fields.Therefore the present invention can also all be led merely with obtained Domain classification model just can be realized the purpose for the text fields that identification user is inputted, without carrying out disaggregated model Training after, recycle the obtained text field identification model of training to obtain field belonging to text.

Unit 22 is marked in such a way that domain classification template is labeled text fields, without by manually carrying out The mark of text field just can obtain largely having marked the text of its fields, and then mark neck using obtained The text training in domain obtains text field identification model.

Training unit 23, for training classification mould using the field of each text marking as output using each text as input Type, to obtain text field identification model.

Training unit 23 using field belonging to each text and each text for being marked of mark unit 22 as training sample, Will each text as input, using the field of each text marking as exporting, train classification models are known to obtain text field Other model.By the obtained text field identification model of the training of training unit 23, the text inputted according to user just can be realized Originally the purpose in field belonging to the text is obtained.

Since mark unit 22 can obtain a large amount of text for having marked field, training unit 23 is according to enough The text field identification model that labeled data training obtains, can be effectively prevented from overfitting problem, to more accurately know Field belonging to other text.

Fig. 3 shows the frame for being suitable for the exemplary computer system/server 012 for being used to realize embodiment of the present invention Figure.The computer system/server 012 that Fig. 3 is shown is only an example, should not function and use to the embodiment of the present invention Range band carrys out any restrictions.

As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes The component of business device 012 can include but is not limited to: one or more processor or processing unit 016, system storage 028, connect the bus 018 of different system components (including system storage 028 and processing unit 016).

Bus 018 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 012 typically comprises a variety of computer system readable media.These media, which can be, appoints The usable medium what can be accessed by computer system/server 012, including volatile and non-volatile media, movably With immovable medium.

System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include other Removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034 can For reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although in Fig. 3 It is not shown, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to can The CD drive of mobile anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these situations Under, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 may include At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured To execute the function of various embodiments of the present invention.

Program/utility 040 with one group of (at least one) program module 042, can store in such as memory In 028, such program module 042 includes --- but being not limited to --- operating system, one or more application program, other It may include the realization of network environment in program module and program data, each of these examples or certain combination.Journey Sequence module 042 usually executes function and/or method in embodiment described in the invention.

Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment, Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with outside radar equipment, can also be with One or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make the meter Any equipment (such as network interface card, the modulation that calculation machine systems/servers 012 can be communicated with one or more of the other calculating equipment Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/clothes Being engaged in device 012 can also be by network adapter 020 and one or more network (such as local area network (LAN), wide area network (WAN) And/or public network, such as internet) communication.As shown, network adapter 020 by bus 018 and computer system/ Other modules of server 012 communicate.It should be understood that computer system/server 012 can be combined although being not shown in Fig. 3 Using other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external magnetic Dish driving array, RAID system, tape drive and data backup storage system etc..

Processing unit 016 by the program that is stored in system storage 028 of operation, thereby executing various function application with And data processing, such as realize method flow provided by the embodiment of the present invention.

Above-mentioned computer program can be set in computer storage medium, i.e., the computer storage medium is encoded with Computer program, the program by one or more computers when being executed, so that one or more computers execute in the present invention State method flow shown in embodiment and/or device operation.For example, it is real to execute the present invention by said one or multiple processors Apply method flow provided by example.

With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited by Tangible medium, can also be directly from network downloading etc..It can be using any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

Using technical solution provided by the present invention, text is obtained by way of fusion template classification and category of model Field identification model can alleviate the limitation of existing text field identification method, be effectively prevented from exclusive use classification mould Plate or disaggregated model carry out overfitting problem existing when text field identification, to promote text field identification model to text The accuracy of this fields identification.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of method for establishing text field identification model, which is characterized in that the described method includes:

Obtain the text for not carrying out domain classification；

Field belonging to the text is marked using domain classification template；

Using each text as input, using the field of each text marking as output, train classification models obtain text field identification Model；

Wherein, field belonging to text that user is inputted can be identified using the text field identification model.

2. the method according to claim 1, wherein the domain classification template obtains in the following manner:

Obtain the common text in each field；

Word cutting is carried out to the common text, to obtain the semanteme of each word in the common text；

According to the semantic extensive to common text progress of word each in the common text；

Using the extensive result of the common text as the domain classification template of the common text fields.

3. the method according to claim 1, wherein described marked belonging to the text using domain classification template Field include:

Word cutting is carried out to the text, to obtain the semanteme of each word in the text；

According to the semantic extensive to text progress of word each in the text, to obtain the extensive result of the text；

Judge whether the extensive result of the text hits the domain classification template；

If the extensive result of the text hits the domain classification template, the corresponding neck of domain classification template that will be hit Domain is labeled as field belonging to the text；

If domain classification template described in the extensive result miss of the text, field belonging to the text is labeled as pre- If field.

4. according to the method described in claim 3, it is characterized in that, whether the extensive result for judging the text hits institute Stating domain classification template includes:

Calculate the text similarity between the extensive result of the text and the domain classification template；

If calculating obtained text similarity greater than preset threshold, it is determined that the extensive result of the text hits the field Otherwise classification model determines miss.

5. the method according to claim 1, wherein being marked belonging to the text using domain classification template After field, further includes:

Using the extensive result of the text as the domain classification template of the text fields.

6. a kind of device for establishing text field identification model, which is characterized in that described device includes:

Acquiring unit, for obtaining the text for not carrying out domain classification；

Unit is marked, for marking field belonging to the text using domain classification template；

Training unit, for using each text as input, using the field of each text marking as output, train classification models to be obtained To text field identification model.

7. device according to claim 6, which is characterized in that the mark unit obtains the field in the following manner Classification model:

Obtain the common text in each field；

8. device according to claim 6, which is characterized in that the mark unit is marking institute using domain classification template It is specific to execute when stating field belonging to text:

9. device according to claim 8, which is characterized in that the mark unit is in the extensive result for judging the text It is specific to execute when whether hitting the domain classification template:

10. device according to claim 6, which is characterized in that the mark unit is marked using domain classification template After field belonging to the text, also execute:

11. a kind of equipment, which is characterized in that the equipment includes:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of storage medium comprising computer executable instructions, the computer executable instructions are by computer disposal For executing such as method as claimed in any one of claims 1 to 5 when device executes.