CN107451106A - Text method and device for correcting, electronic equipment - Google Patents

Text method and device for correcting, electronic equipment Download PDF

Info

Publication number
CN107451106A
CN107451106A CN201710618374.3A CN201710618374A CN107451106A CN 107451106 A CN107451106 A CN 107451106A CN 201710618374 A CN201710618374 A CN 201710618374A CN 107451106 A CN107451106 A CN 107451106A
Authority
CN
China
Prior art keywords
text
corrected
coding
network
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710618374.3A
Other languages
Chinese (zh)
Inventor
陈永环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710618374.3A priority Critical patent/CN107451106A/en
Publication of CN107451106A publication Critical patent/CN107451106A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Abstract

This specification one or more embodiment provides a kind of text method and device for correcting, electronic equipment.Wherein, in the text correcting method, first, text to be corrected is obtained, then, characteristic vector corresponding with the text to be corrected is determined using coding rule, finally, characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected.Wherein, the text, which corrects model, includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.By provided herein is text method and device for correcting, electronic equipment can improve text identification rate.

Description

Text method and device for correcting, electronic equipment
Technical field
This specification one or more embodiment is related to machine learning techniques field, more particularly to a kind of text correcting method And device, electronic equipment.
Background technology
The technology that each class text is identified is applied in many scenes, such as:Identify black list user.Wherein, Text to be identified can be such as:Name, place name, exabyte etc..
At present, can if text to be identified is not predefined canonical form (such as misspelling or non-shorthand) The text can be caused not to be successfully identified.Such as:The canonical form of certain text is:" baidu ", but because of misspelling, it is actual The text of input is:“baido”.It can be seen that there is the demand corrected to text.
The content of the invention
In view of this, this specification one or more embodiment provides a kind of text method and device for correcting, electronic equipment.
To achieve the above object, the technical scheme that this specification one or more embodiment provides is as follows:
A kind of text correcting method, including:
Obtain text to be corrected;
Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected, institute Stating text correction model includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.
A kind of text correcting device, including:
Text obtains module, obtains text to be corrected;
Modular converter, characteristic vector corresponding with the text to be corrected is determined using coding rule;
Text corrects module, characteristic vector input text is corrected into model, output is corresponding with the text to be corrected Received text, the text, which corrects model, includes coding network and decoding network, and the coding network and decoding network are to follow Ring neutral net RNN.
A kind of electronic equipment, including:
Processor;
For storing the memory of processor-executable instruction;
The processor is configured as:
Obtain text to be corrected;
Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected, institute Stating text correction model includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.
Can be seen that by above technical scheme includes coding network and decoding network based on what is obtained by machine learning Text correct model, obtaining after text is corrected, the above-mentioned text of characteristic vector input corresponding to text to be corrected can entangled Positive model, with outputting standard text, the function of being corrected to text is realized, and then in text identification scene, can improved Text identification rate.
Brief description of the drawings
Fig. 1 is the structure of the text correction model according to an exemplary embodiment;
Fig. 2 is a kind of flow of text correcting method according to an exemplary embodiment;
Fig. 3 is a kind of block diagram of text correcting device according to an exemplary embodiment;
Fig. 4 is the block diagram of another text correcting device according to an exemplary embodiment.
Embodiment
This specification proposes a kind of text correcting method, and the text obtained using machine learning corrects model to enter to text Row is corrected, wherein, text, which corrects model, can use seq2seq (Sequence-to-Sequence) model.The seq2seq models The text that can be used for correcting includes but is not limited to:The title of various objects (such as place, people, company), the Query for inquiry Entry.Wherein, for each received text, a variety of non-standard texts are can correspond to, received text can be predefined one Kind standard scale reaches, and non-standard text can be the change for the partial character that the basis reached in standard scale is made, for example, certain standard is literary Originally it is:" Luck did better than Huan ", non-standard text corresponding with the received text can be:“Luck did Better then Huan " or " Luck do better than Huan " etc..Reality text identification scene in, it is expected by Non-standard text identification is into corresponding received text caused by rewriting or misspelling etc., to reach higher text identification Rate.
Fig. 1 is the structure of the text correction model according to an exemplary embodiment, as shown in figure 1, the text is corrected Model includes coding (Encoder) network and decoding (Decoder) network, and coding network and decoding network can be circulation nerve net Network (recurrent neural network, RNN), such as:Shot and long term remembers (Long short-term memory, LSTM) net Network.Wherein, the input of the coding network can be with input text (text to be identified) corresponding characteristic vector (x1, X2 ..., xn), x1, x2 ..., xn can represent the character inputted in text respectively.The coding network can use In vector (fixed-length vector) that text code is fixed size will be inputted and as the defeated of decoding network Enter, the decoding network can be used for being decoded according to the output of above-mentioned coding network, export a vector (y1, y2 ..., Ym), received text finally can be determined according to the vector (y1, y2 ..., ym), wherein, y1, y2 ..., ym can be used respectively To represent a character in received text.In RNN, several nodes can be generally included, each node is counted according to input Calculate corresponding output, also, the output of the output of the latter node and previous node about (previous node it is defeated Go out the input as the latter node).In practical application, RNN can be handled the text of random length, and (that is, n, m's takes Value can not fix).
The characteristics of LSTM is to add one in the algorithm to judge information whether useful " processor " (being referred to as cell).One Three fan doors can be generally placed in individual cell, has been called input gate respectively, forgets door and out gate.One information enters LSTM nets Among network, it can be judged whether according to rule useful.Only meeting the information of algorithm certification can just leave, and the information not being inconsistent is then Passed into silence by forgeing door.LSTM belongs to a kind of technology well-known to those skilled in the art, and this is not described in detail herein.
Say be exactly nothing but one-in-and-two-out operation principle, can but be solved under computing repeatedly in neutral net long-term Existing big problem.At present it has been proved that LSTM is the effective technology for solving long sequence Dependence Problem, and this technology is pervasive Property it is very high, cause the possibility brought to change very more.Each researcher proposes the variable version of oneself according to LSTM one after another, This just allows LSTM to handle Protean Perpendicular Problems.
Next, the process for obtaining above text by machine learning and correcting model will be introduced first.In one embodiment, The process that training text corrects model may include 10~step 30 of following steps, wherein:
Step 10:Obtaining includes the sample set of some samples pair, wherein, the sample is to including a non-standard text and one Received text.
For example, the sample set of acquisition is following (wherein, X represents non-standard text, and Y represents received text):
Step 20:For each sample pair, the non-standard text is converted into the first coding vector using coding rule, The received text is converted into the second coding vector.
In one embodiment, the difference for the character types that can be directed in text, selects the volume corresponding with character types Code rule determines coding vector.In one embodiment, when the character types for detecting some sample are Chinese, can use Chinese character coding rule encodes to each chinese character in text, to obtain coded sequence dyad;If detect certain The character types of individual sample are non-Chinese (such as English), then the non-Chinese character in text are carried out using ASCII coding rules Coding, to obtain coded sequence dyad.Wherein, encoding of chinese characters refers to representing the character code of Chinese character in a computer, Chinese character coding rule is, for example,:One-hot coding rules, Chinese internal code coding rule, Chinese character international code coding rule, position Code coding rule etc..Come respectively to Chinese character and non-middle word by using Chinese character coding rule and ASCII coding rules Symbol is encoded, it is possible to achieve the correction to some unconventional words.Wherein, the unconventional word of definable is what is do not included in dictionary Word, such as:daueoeo.In some texts correct scene, it usually needs name, place name etc. are corrected, and name and ground Name is often some user-defined words (i.e. unconventional word), and this causes to limit to text error-correcting effect, passes through above-mentioned coding Rule can effectively lift text error correction effect.
When carrying out vectorization to coded sequence, different character types can be directed to, using corresponding with character types Vectorization rule carry out vectorization.Specifically, can be according to sequencing, one by one by every N (N >=1) position in character string A numerical value being converted into characteristic vector.
Step 30:Utilize first coding vector and second coding vector training coding network and decoding net Network, obtain the text comprising the coding network and the decoding network and correct model.
Specifically, the sample set for training pattern includes some samples to (X, Y), if Xi represents non-standard text (i.e. The input of coding network), Yi represents received text (i.e. the output of decoding network), wherein, (N is the number of sample pair to 1≤i≤N Amount).P (Yi | Xi) value can be obtained by coding network and decoding network, then by EM algorithm (Expectation Maximization Algorithm, EMA), it can obtain maximizing conditional likelihood, i.e.,:
Wherein, θ can represent to treat training parameter in seq2seq models.
In an alternate embodiment of the invention, (steepest descent) Algorithm for Training seq2seq moulds can be declined by gradient Type.
Fig. 2 is a kind of flow of text correcting method according to an exemplary embodiment, and this method can apply to Each class of electronic devices is (such as:User equipment or server) in, this method utilizes the seq2seq moulds obtained above by machine learning Type realizes that this method can include:
Step 101, text to be corrected is obtained.
Wherein, the mode for obtaining text to be corrected includes but is not limited to:Receive user input text to be corrected, or from Extracting particular text fragment in the text of family input is used as text to be corrected, or server device to be logged on client device Accounts information as text to be corrected, etc..
Step 103, characteristic vector corresponding with the text to be corrected is determined using coding rule.
In one embodiment, step 103 can include:
Step 131:Determine coding corresponding with each character in the text to be corrected one by one using coding rule, obtain Coded sequence corresponding with the text to be corrected.For example, for text to be corrected:" baido ", coded sequence are: “0110001001100001011010010110010001101111”。
Step 132:Characteristic vector corresponding with the text to be corrected is determined according to the coded sequence.For example, for Coded sequence:0110001001100001011010010110010001101111, it is determined that characteristic vector be:(98,97, 105,100,111).
Step 105, characteristic vector input text is corrected into model (i.e. seq2seq models), output is waited to entangle with described Received text corresponding to positive text.
In another embodiment, text correcting method may include steps of:
Step 101, text to be corrected is obtained.
Step 102, the character types belonging to the character in the text to be corrected, from multiple candidate code rules Choose coding rule corresponding with the character types.
In an optional embodiment, if the character types are Chinese, Chinese character coding rule is chosen;Otherwise, then select Take ASCII coding rules.
Step 103, characteristic vector corresponding with the text to be corrected is determined using the coding rule of selection.
Step 105, characteristic vector input text is corrected into model (i.e. seq2seq models), output is waited to entangle with described Received text corresponding to positive text.
Model is corrected based on the text for including coding network and decoding network obtained by machine learning, waits to entangle obtaining After positive text, characteristic vector corresponding to text to be corrected can be inputted above-mentioned text and correct model, with outputting standard text, realized The function of being corrected to text, and then in text identification scene, text identification rate can be improved.Text based on seq2seq This correction model, human intervention (such as the algorithm that artificially lays down a regulation) can be avoided significantly, can cause text identification process more Intelligence, and accuracy is higher.
On the application scenarios of above-mentioned text correcting method, enumerate several:
1. for identifying the text messages such as the wrong name write or rewritten, place name, exabyte, and match standard literary style.
2. it is used in the scene of identification black list user, the identification for the black list user that information errors are write or rewritten.
3. in information search scene, the wrong query for writing or rewriting of identification, to improve search efficiency.
Corresponding to the above method, this specification one or more embodiment also provides a kind of text correcting device, the problem Recommendation apparatus can apply to each class of electronic devices.
As shown in figure 3, in one embodiment, a kind of text correcting device 300 can include:Text obtains module 301, turned Change the mold block 302 and text corrects module 303;Wherein:
Text obtains module 301 and is configured as:Obtain text to be corrected;
Modular converter 302 is configured as:Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Text is corrected module 303 and is configured as:Characteristic vector input text is corrected into model, output is waited to entangle with described Received text corresponding to positive text, the text, which corrects model, includes coding network and decoding network, and the coding network is conciliate Code network is Recognition with Recurrent Neural Network RNN.
As shown in figure 4, in another embodiment, based on the device described in Fig. 3, the device 300 can also include rule choosing Modulus block 304, the rule are chosen module 304 and are configured as:The character types belonging to character in the text to be corrected, Coding rule corresponding with the character types is chosen from multiple candidate code rules.
In the embodiment shown in fig. 4, modular converter 302 can be configured as:Utilize the coding rule determination of selection and institute State characteristic vector corresponding to text to be corrected.
In an optional embodiment, the modular converter 302 can specifically include:
Coded sequence determining module, determined one by one and each word in the text to be corrected using the coding rule of selection Encoded corresponding to symbol, obtain coded sequence corresponding with the text to be corrected;
Vectorization module, characteristic vector corresponding with the text to be corrected is determined according to the coded sequence.
In an optional embodiment, the rule is chosen module 304 and can be configured as:If during the character types are Text, choose Chinese character coding rule;Otherwise, ASCII coding rules are chosen.
In one embodiment, described device can also include:
Sample obtains module, and obtaining includes the sample sets of some samples pair, the sample to including a non-standard text with One received text;
Coding vector determining module, for each sample pair, the non-standard text is converted to using coding rule One coding vector, the received text is converted into the second coding vector;
Model training module, coding network and solution are trained using first coding vector and second coding vector Code network, obtain text and correct model.
This specification one or more embodiment provides a kind of electronic equipment (such as:User equipment, server or other meters Calculate equipment), processor, internal bus, network interface, memory (including internal memory and nonvolatile memory) can be included, Certainly the hardware being also possible that required for other business.Processor can be CPU (CPU), processing unit, processing Circuit, processor, application specific integrated circuit (ASIC), microprocessor or executable instruction other processing logics in one or more Individual example.Processor read from nonvolatile memory corresponding to computer program into internal memory then run.Certainly, except Outside software realization mode, this specification one or more embodiment is not precluded from other implementations, such as logical device suppression Or mode of software and hardware combining etc., that is to say, that the executive agent of following handling process is not limited to each logic unit, Can also be hardware or logical device.
In one embodiment, the processor can be configured as:
Obtain text to be corrected;
Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected, institute Stating text correction model includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for equipment For applying example, device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is joined See the part explanation of embodiment of the method.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer, and the concrete form of computer can To be personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet PC, wearable device or these equipment The combination of any several equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during specification one or more embodiment.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flashRAM).Internal memory is showing for computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitorymedia), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification one or more embodiment can be provided as method, be System or computer program product.Therefore, this specification one or more embodiment can use complete hardware embodiment, complete software The form of embodiment in terms of embodiment or combination software and hardware.Moreover, this specification one or more embodiment can use The computer-usable storage medium for wherein including computer usable program code in one or more (includes but is not limited to disk Memory, CD-ROM, optical memory etc.) on the form of computer program product implemented.
This specification one or more embodiment can computer executable instructions it is general on Described in hereafter, such as program module.Usually, program module includes performing particular task or realizes particular abstract data type Routine, program, object, component, data structure etc..Can also put into practice in a distributed computing environment this specification one or Multiple embodiments, in these DCEs, by being performed by communication network and connected remote processing devices Task.In a distributed computing environment, the local and remote computer that program module can be located at including storage device is deposited In storage media.
The embodiment of this specification one or more embodiment is the foregoing is only, is not limited to this specification One or more embodiments.To those skilled in the art, this specification one or more embodiment can have it is various more Change and change.It is all this specification one or more embodiment spirit and principle within made any modification, equivalent substitution, Improve etc., it should be included within the right of this specification one or more embodiment.

Claims (12)

1. a kind of text correcting method, including:
Obtain text to be corrected;
Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected, the text This correction model includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.
2. according to the method for claim 1, described corresponding with the text to be corrected special using coding rule determination Before sign vector, methods described also includes:
The character types belonging to character in the text to be corrected, chosen and the word from multiple candidate code rules Accord with coding rule corresponding to type;
It is described to determine that characteristic vector corresponding with the text to be corrected includes using coding rule:
Characteristic vector corresponding with the text to be corrected is determined using the coding rule of selection.
3. method according to claim 1 or 2, described corresponding with the text to be corrected special using coding rule determination Sign vector includes:
Determine coding corresponding with each character in the text to be corrected one by one using coding rule, acquisition is waited to entangle with described Coded sequence corresponding to positive text;
Characteristic vector corresponding with the text to be corrected is determined according to the coded sequence.
4. according to the method for claim 2, the character types belonging to the character in text to be corrected described in the basis, from Coding rule corresponding with the character types is chosen in multiple candidate code rules to be included:
If the character types are Chinese, Chinese character coding rule is chosen;Otherwise, ASCII coding rules are chosen.
5. according to the method for claim 1, the training method of the text correction model includes:
Obtaining includes the sample set of some samples pair, and the sample is to including a non-standard text and a received text;
For each sample pair, the non-standard text is converted into the first coding vector using coding rule, by the standard Text is converted to the second coding vector;
Using first coding vector and second coding vector training coding network and decoding network, obtain text and entangle Positive model.
6. a kind of text correcting device, including:
Text obtains module, obtains text to be corrected;
Modular converter, characteristic vector corresponding with the text to be corrected is determined using coding rule;
Text corrects module, characteristic vector input text is corrected into model, output is corresponding with the text to be corrected to mark Quasi- text, the text, which corrects model, includes coding network and decoding network, and the coding network and decoding network are circulation god Through network RNN.
7. device according to claim 6, described device also include:
Rule chooses module, the character types belonging to character in the text to be corrected, from multiple candidate codes rule It is middle to choose coding rule corresponding with the character types;
The modular converter determines characteristic vector corresponding with the text to be corrected using the coding rule of selection.
8. the device according to claim 6 or 7, the modular converter includes:
Coded sequence determining module, determined one by one and each character pair in the text to be corrected using the coding rule of selection The coding answered, obtain coded sequence corresponding with the text to be corrected;
Vectorization module, characteristic vector corresponding with the text to be corrected is determined according to the coded sequence.
9. device according to claim 7, the rule is chosen module and is configured as:If the character types are Chinese, Choose Chinese character coding rule;Otherwise, ASCII coding rules are chosen.
10. device according to claim 6, described device also include:
Sample obtains module, obtains the sample set for including some samples pair, the sample including a non-standard text and one to marking Quasi- text;
Coding vector determining module, for each sample pair, the non-standard text is converted into the first volume using coding rule Code vector, the received text is converted into the second coding vector;
Model training module, coding network and decoding net are trained using first coding vector and second coding vector Network, obtain text and correct model.
11. a kind of electronic equipment, including:
Processor;
For storing the memory of processor-executable instruction;
The processor is configured as:
Obtain text to be corrected;
Characteristic vector corresponding with the text to be corrected is determined using coding rule;
Characteristic vector input text is corrected into model, output received text corresponding with the text to be corrected, the text This correction model includes coding network and decoding network, and the coding network and decoding network are Recognition with Recurrent Neural Network RNN.
12. electronic equipment according to claim 11, determined and the text pair to be corrected using coding rule described Before the characteristic vector answered, in addition to:
The character types belonging to character in the text to be corrected, chosen and the word from multiple candidate code rules Accord with coding rule corresponding to type;
It is described to determine that characteristic vector corresponding with the text to be corrected includes using coding rule:
Determine coding corresponding with each character in the text to be corrected one by one using coding rule, acquisition is waited to entangle with described Coded sequence corresponding to positive text;
Characteristic vector corresponding with the text to be corrected is determined according to the coded sequence.
CN201710618374.3A 2017-07-26 2017-07-26 Text method and device for correcting, electronic equipment Pending CN107451106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710618374.3A CN107451106A (en) 2017-07-26 2017-07-26 Text method and device for correcting, electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710618374.3A CN107451106A (en) 2017-07-26 2017-07-26 Text method and device for correcting, electronic equipment

Publications (1)

Publication Number Publication Date
CN107451106A true CN107451106A (en) 2017-12-08

Family

ID=60489058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710618374.3A Pending CN107451106A (en) 2017-07-26 2017-07-26 Text method and device for correcting, electronic equipment

Country Status (1)

Country Link
CN (1) CN107451106A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647207A (en) * 2018-05-08 2018-10-12 上海携程国际旅行社有限公司 Natural language modification method, system, equipment and storage medium
CN109408813A (en) * 2018-09-30 2019-03-01 北京金山安全软件有限公司 Text correction method and device
CN109948152A (en) * 2019-03-06 2019-06-28 北京工商大学 A kind of Chinese text grammer error correcting model method based on LSTM
CN110162767A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 The method and apparatus of text error correction
CN111339755A (en) * 2018-11-30 2020-06-26 中国移动通信集团浙江有限公司 Automatic error correction method and device for office data
WO2020168750A1 (en) * 2019-02-18 2020-08-27 平安科技(深圳)有限公司 Address information standardization method and apparatus, computer device and storage medium
CN112597753A (en) * 2020-12-22 2021-04-02 北京百度网讯科技有限公司 Text error correction processing method and device, electronic equipment and storage medium
CN113270088A (en) * 2020-02-14 2021-08-17 阿里巴巴集团控股有限公司 Text processing method, data processing method, voice processing method, data processing device, voice processing device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
US20160350655A1 (en) * 2015-05-26 2016-12-01 Evature Technologies (2009) Ltd. Systems Methods Circuits and Associated Computer Executable Code for Deep Learning Based Natural Language Understanding
CN106610930A (en) * 2015-10-22 2017-05-03 科大讯飞股份有限公司 Foreign language writing automatic error correction method and system
CN106656637A (en) * 2017-02-24 2017-05-10 国网河南省电力公司电力科学研究院 Anomaly detection method and device
US20170139905A1 (en) * 2015-11-17 2017-05-18 Samsung Electronics Co., Ltd. Apparatus and method for generating translation model, apparatus and method for automatic translation
CN106776501A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 A kind of automatic method for correcting of text wrong word and server
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160350655A1 (en) * 2015-05-26 2016-12-01 Evature Technologies (2009) Ltd. Systems Methods Circuits and Associated Computer Executable Code for Deep Learning Based Natural Language Understanding
CN106610930A (en) * 2015-10-22 2017-05-03 科大讯飞股份有限公司 Foreign language writing automatic error correction method and system
US20170139905A1 (en) * 2015-11-17 2017-05-18 Samsung Electronics Co., Ltd. Apparatus and method for generating translation model, apparatus and method for automatic translation
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
CN106776501A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 A kind of automatic method for correcting of text wrong word and server
CN106656637A (en) * 2017-02-24 2017-05-10 国网河南省电力公司电力科学研究院 Anomaly detection method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162767A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 The method and apparatus of text error correction
CN108647207A (en) * 2018-05-08 2018-10-12 上海携程国际旅行社有限公司 Natural language modification method, system, equipment and storage medium
CN108647207B (en) * 2018-05-08 2022-04-05 上海携程国际旅行社有限公司 Natural language correction method, system, device and storage medium
CN109408813A (en) * 2018-09-30 2019-03-01 北京金山安全软件有限公司 Text correction method and device
CN109408813B (en) * 2018-09-30 2023-07-21 北京金山安全软件有限公司 Text correction method and device
CN111339755A (en) * 2018-11-30 2020-06-26 中国移动通信集团浙江有限公司 Automatic error correction method and device for office data
WO2020168750A1 (en) * 2019-02-18 2020-08-27 平安科技(深圳)有限公司 Address information standardization method and apparatus, computer device and storage medium
CN109948152A (en) * 2019-03-06 2019-06-28 北京工商大学 A kind of Chinese text grammer error correcting model method based on LSTM
CN113270088A (en) * 2020-02-14 2021-08-17 阿里巴巴集团控股有限公司 Text processing method, data processing method, voice processing method, data processing device, voice processing device and electronic equipment
CN113270088B (en) * 2020-02-14 2022-04-29 阿里巴巴集团控股有限公司 Text processing method, data processing method, voice processing method, data processing device, voice processing device and electronic equipment
CN112597753A (en) * 2020-12-22 2021-04-02 北京百度网讯科技有限公司 Text error correction processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107451106A (en) Text method and device for correcting, electronic equipment
US11816442B2 (en) Multi-turn dialogue response generation with autoregressive transformer models
US10380236B1 (en) Machine learning system for annotating unstructured text
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN110929515B (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
US11468239B2 (en) Joint intent and entity recognition using transformer models
CN110532353B (en) Text entity matching method, system and device based on deep learning
CN111814466A (en) Information extraction method based on machine reading understanding and related equipment thereof
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN113590761B (en) Training method of text processing model, text processing method and related equipment
CN108959474B (en) Entity relation extraction method
US11694034B2 (en) Systems and methods for machine-learned prediction of semantic similarity between documents
CN115146068B (en) Method, device, equipment and storage medium for extracting relation triples
CN115587594A (en) Network security unstructured text data extraction model training method and system
CN111079433B (en) Event extraction method and device and electronic equipment
CN112508048A (en) Image description generation method and device
CN115099233A (en) Semantic analysis model construction method and device, electronic equipment and storage medium
CN112395880B (en) Error correction method and device for structured triples, computer equipment and storage medium
CN116127925B (en) Text data enhancement method and device based on destruction processing of text
US20230042327A1 (en) Self-supervised learning with model augmentation
CN115952266A (en) Question generation method and device, computer equipment and storage medium
CN115080748A (en) Weak supervision text classification method and device based on noisy label learning
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN113887201A (en) Text fixed-length error correction method, device, equipment and storage medium
CN112446206A (en) Menu title generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20171208

RJ01 Rejection of invention patent application after publication