CN109284483A - Text handling method, device, storage medium and electronic equipment - Google Patents

Text handling method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109284483A
CN109284483A CN201811413346.9A CN201811413346A CN109284483A CN 109284483 A CN109284483 A CN 109284483A CN 201811413346 A CN201811413346 A CN 201811413346A CN 109284483 A CN109284483 A CN 109284483A
Authority
CN
China
Prior art keywords
text
processed
abnormal
handling method
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811413346.9A
Other languages
Chinese (zh)
Other versions
CN109284483B (en
Inventor
滕召荣
李坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Golden Panda Co Ltd
Original Assignee
Golden Panda Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Golden Panda Co Ltd filed Critical Golden Panda Co Ltd
Priority to CN201811413346.9A priority Critical patent/CN109284483B/en
Publication of CN109284483A publication Critical patent/CN109284483A/en
Application granted granted Critical
Publication of CN109284483B publication Critical patent/CN109284483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

This disclosure relates to a kind of text handling method, text processing apparatus, computer readable storage medium and electronic equipment.Whether the text handling method that the embodiment of the present disclosure provides includes: in detection text to be processed comprising abnormal mark;If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;Structuring processing is carried out to obtain structural data to the text to be processed.Text handling method provided by the embodiment of the present disclosure can greatly retain the valid data in text to be processed, avoid the problem that loss of data.

Description

Text handling method, device, storage medium and electronic equipment
Technical field
This disclosure relates to field of computer technology, and in particular to a kind of text handling method, text processing apparatus, computer Readable storage medium storing program for executing and electronic equipment.
Background technique
Structured techniques are one of NLP (Natural Language Processing, i.e. natural language processing) weights The technology wanted, the structuring of text are contents required for extracting from natural language text, form structural data.Among these It inherently uses the tools such as canonical and dictionary and removes the structural data needed obtained by matching.In one section of normal medical treatment text of China Should be all be largely Chinese character, wherein a small amount of number, letter or spcial character can be adulterated.If one section of text It is middle a large amount of number, English alphabet or exception symbol occur, it may be considered that there is exception in this section of text.When to exception When medical data text carries out structuring processing, since canonical is greedy mode, on the one hand consumption can be matched in canonical Very more resource, on the other hand can generate the data object of very more (may be up to up to a million), this will make operation system No enough resources of uniting go to handle, and cause load very high, and the time that this section of abnormal text is spent may take several days not It can calculate and complete.It therefore is an important technology to the detection of the medical text of real exception and cleaning technique, and bad assurance.
Mainly include at present following two to the method for medical abnormal text inspection and alarm:
The first is abnormal matching, checks whether occur continuous multiple numbers, English character or spy in medical text Different character, appearance are then considered abnormal text, the exception text are just abandoned after alarm.
It is a time out detection second, the mechanism of setting time-out is examined when by carrying out structuring to medical text It looks into, when can only take some time when generally normally medical text executes structuring, therefore be spent when executing structuring Between reach certain threshold value, then it is assumed that text is abnormal, after just abandon the exception text.
Both the above method is very single for the judgement dimension of abnormal text, when carrying out the abnormal examination of medical text Discarding can be largely subjected to the useful data of structuring, thus will lead to the problem of serious normal data is lost.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of text handling method, text processing apparatus, computer readable storage medium And electronic equipment, and then overcome loss of data caused by the limitation and defect due to the relevant technologies tight at least to a certain extent The technical problem of weight.
According to one aspect of the disclosure, a kind of text handling method is provided, is characterized in that, comprising:
It whether detects in text to be processed comprising abnormal mark;
If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;
Structuring processing is carried out to obtain structural data to the text to be processed.
In a kind of illustrative embodiments of the disclosure, whether include abnormal mark packet in the detection text to be processed It includes:
The length of text to be processed is detected, and judges whether the length is greater than preset threshold;
Whether if judging, the length is greater than preset threshold, detect in the text to be processed comprising abnormal mark.
In a kind of illustrative embodiments of the disclosure, whether gone back comprising abnormal mark in the detection text to be processed Include:
If judging, the length is less than or equal to preset threshold, carries out structuring processing to the text to be processed to obtain To structural data.
It is described that structuring processing is carried out to obtain to the text to be processed in a kind of illustrative embodiments of the disclosure Include: to structural data
Off-note detection is carried out to the text to be processed, with judge the text to be processed whether be normal text or Person's exception text;
If judging the text to be processed for normal text, structuring processing is carried out to obtain to the text to be processed Structural data.
It is described that off-note detection is carried out to the text to be processed in a kind of illustrative embodiments of the disclosure, To judge whether the text to be processed is normal text or abnormal text, comprising:
It whether detects in the text to be processed comprising continuous non-Chinese Fields;
If detecting in the text to be processed comprising continuous non-Chinese Fields, judge the text to be processed for exception Text;
If being not detected in the text to be processed comprising continuous non-Chinese Fields, judge that the text to be processed is positive Chang Wenben.
It is described that structuring processing is carried out to obtain to the text to be processed in a kind of illustrative embodiments of the disclosure To structural data further include:
If judging the text to be processed for abnormal text, the abnormal text is directed into abnormal text collection;
When the abnormal text collection meets preset condition, abnormal text prompt information is sent.
In a kind of illustrative embodiments of the disclosure, after sending abnormal text prompt information, the method is also Include:
The abnormal text in the abnormal text collection is analyzed, obtains abnormal mark to form abnormal logo collection.
According to one aspect of the disclosure, a kind of text processing apparatus is provided, is characterized in that, comprising:
Whether detection module is configured as detecting in text to be processed comprising abnormal mark;
Cleaning modul measures if being configured as school comprising abnormal mark in the text to be processed, to described to be processed Part in text comprising identifying extremely carries out text cleaning;
Processing module is configured as carrying out structuring processing to the text to be processed to obtain structural data.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, It is characterized in that, the text handling method of any description above is realized when the computer program is executed by processor.
According to one aspect of the disclosure, a kind of electronic equipment is provided, is characterized in that, including processor and storage Device;Wherein, memory is used to store the executable instruction of the processor, the processor is configured to via can described in execution It executes instruction to execute the text handling method of any description above.
Text handling method provided by the embodiment of the present disclosure can be with by carrying out abnormal label detection to text to be processed Unusual part in text is cleared up rather than is simply abandoned in full, therefore can greatly be retained in text to be processed Valid data, avoid the problem that loss of data.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 schematically shows the step flow chart of text handling method in a kind of illustrative embodiments of the disclosure.
Fig. 2 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 3 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 4 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 5 schematically shows the composition block diagram of text processing apparatus in disclosure illustrative embodiments.
Fig. 6 schematically shows a kind of schematic diagram of program product in disclosure illustrative embodiments.
Fig. 7 schematically shows the module diagram of a kind of electronic equipment in disclosure illustrative embodiments.
Fig. 8 schematically shows a kind of text handling method process of the disclosure illustrative embodiments in application scenarios Figure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms It applies, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more comprehensively and Completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, structure or characteristic It can be incorporated in any suitable manner in one or more embodiments.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
A kind of text handling method is provided first in the illustrative embodiments of the disclosure, mainly can be applied to medical text This structuring processing, to obtain structural data from medical text.Wherein, medical text may include patient medical history, The various e-texts comprising medical data such as inpatient cases.Refering to what is shown in Fig. 1, text handling method master provided in this embodiment It may comprise steps of:
Whether step S110. detects in text to be processed comprising abnormal mark.
This step first detects the content of text of text to be processed, and whether judgement includes wherein abnormal mark, Wherein abnormal mark may include the messy code occurred in text, meaningless string number, letter or spcial character etc..In order to Abnormal label detection is preferably carried out, abnormal label detection rule can be formulated, may determine that if meeting the rule as exception Mark;In addition abnormal logo collection can also be preset, using mark is treated extremely included in the exception logo collection It handles text and carries out matching detection, matched text can judge the matched text for abnormal mark if detecting.
If step S120. is detected comprising abnormal mark in text to be processed, text cleaning is carried out to abnormal identify.
According to the testing result of step S110, if detecting that this step will be right comprising abnormal mark in text to be processed Mark carries out text cleaning extremely for this.The purpose of text cleaning is so that no longer comprising abnormal mark in text to be processed.In order to Cleaning effect is improved, the present embodiment can carry out circulation cleaning to the abnormal mark in text to be processed, i.e., repeatedly be examined It surveys and text cleaning can be thought to clear up and complete until can't detect abnormal mark in text to be processed.Wherein, to abnormal mark The mode for knowing progress text cleaning can be direct deletion, be also possible to be replaced with specific identifiable marker, either Any other text-processing mode, this illustrative embodiment do not do particular determination to this.
Step S130. carries out structuring processing to text to be processed to obtain structural data.
After step S120 carries out text cleaning, this step will carry out structure to without the text to be processed identified extremely Change processing to obtain structural data.Wherein the general process of structuring processing, which can be, first segments text to be processed, Then information extraction is carried out to the text after participle, finally obtains structural data.Structuring processing is carried out to text to be processed It can use some data processing platform (DPP)s or computing engines, such as Apache Spark.
Text handling method provided by this illustrative embodiment is by carrying out abnormal label detection to text to be processed Unusual part in text can be cleared up rather than simply be abandoned in full, therefore can greatly retain text to be processed Valid data in this, avoid the problem that loss of data is serious.
On the basis of foregoing exemplary embodiment, another embodiment of the present disclosure provides a kind of text handling method, Whether it may include following steps as shown in Figure 2 comprising abnormal mark that wherein step S110. is detected in text to be processed:
Step S211. detects the length of text to be processed, and judges whether length is greater than preset threshold.
Since abnormal medical text is all a number of segment word, English or the spcial character of overlength under normal circumstances, Extremely before the detection and cleaning that are identified, this step carries out text size detection first, and judges that the text detected is long Whether degree is greater than preset threshold.Wherein preset threshold can be directly related with the information such as the source of text to be processed, type, in addition Can also statistically analyze to obtain according to the historical data of text-processing, such as can be 2000 characters, 3000 characters etc., originally show Example property embodiment does not do particular determination to this.
If step S212. judges that length is greater than preset threshold, whether detect in text to be processed comprising abnormal mark.
According to the testing result of step S211, if it is determined that the length of text to be processed is greater than preset threshold, then can be with Think that the text to be processed is particularly likely that abnormal text, whether then can continue to test in text to be processed comprising abnormal Mark, and then continue the text identified extremely cleaning and subsequent structuring processing.
On the basis of the embodiment, whether step S110. is detected in text to be processed can be with comprising abnormal mark It further includes steps of
If step S213. judge length be less than or equal to preset threshold, to text to be processed carry out structuring processing with Obtain structural data.
It, can be with if the length that the testing result of step S211 is text to be processed is less than or equal to preset threshold Think that the text to be processed is normal text, be directly entered in normal structured process, carries out structuring processing to it to obtain To structural data.
Text handling method provided in this embodiment can use minimum meter by carrying out length detection to text to be processed It is counted as originally identifying that doubtful abnormal text avoids the wasting of resources to improve the efficiency of text-processing.
On the basis of foregoing exemplary embodiment, another embodiment of the present disclosure provides a kind of text handling method, Wherein step S130. to text to be processed carry out structuring processing with obtain structural data may include it is as shown in Figure 3 with Lower step:
Step S331. carries out off-note detection to text to be processed, to judge whether text to be processed is normal text Or abnormal text.
By after the detection and cleaning that identify extremely, this step will carry out off-note detection to text to be processed, with More accurately judge whether text to be processed is normal text or abnormal text.Similarly with abnormal mark, in this step Off-note for detection also may include the messy code occurred in text, meaningless string number, letter or special word Symbol etc..In other words, the off-note detection and the abnormal label detection that carries out in step S110 carried out in this step some Detection content is relevant.In addition, the off-note in this step for detection can also include some other according to history number According to generating or preset feature, it is particularly possible to including the unusual part that is not detected in some abnormal label detections or The unusual part for being detected but not being cleaned.The reason of unusual part is not detected in abnormal label detection may include It is a variety of, for instance it can be possible that abnormal logo collection is incomplete, it is also possible to occur omitting in a large amount of text-processings.Unusual part The reason of having been detected by but not being cleaned may also be including a variety of, for instance it can be possible that being provided with right of compilation in text to be processed It limits and leads to unusual part not and allow to be deleted or replace etc..In addition, before carrying out the off-note detection of this step, it can First to carry out the detection of text size to text to be processed, if text size is more than preset threshold, then this step is executed Off-note detection.And if text size is not above preset threshold, this step can be skipped and directly to this Text to be processed carries out structuring processing, to save calculation resources, improves text-processing efficiency.
If step S332. judges text to be processed for normal text, structuring processing is carried out to obtain to text to be processed To structural data.
After the detection of step S331 and judgement, if a determination be made that text to be processed is normal text, then this Step carries out structuring processing to text to be processed again to obtain structural data.
On the basis of the embodiment, step S130. carries out structuring processing to text to be processed to obtain structuring Data can further include following steps:
If step S333. judges that abnormal text for abnormal text, is directed into abnormal text collection by text to be processed.
After the detection of step S331 and judgement, if a determination be made that text to be processed is abnormal text, then this Abnormal text will be directed into abnormal text collection by step.The abnormal text being directed into abnormal text collection will not tied temporarily Structureization processing, in order to avoid influence the structuring treatment progress of normal text.
Step S334. sends abnormal text prompt information when abnormal text collection meets preset condition.
According to step S333, if constantly there is abnormal text to be detected, will also be continued in abnormal text collection It is constantly imported into abnormal text, and if abnormal text collection meets preset condition, this step will send abnormal text Prompt information.Wherein, it is more than a certain preset threshold that preset condition, which can be the abnormal amount of text in abnormal text collection, can also To be to reach sometime node from importing first abnormal text in abnormal text collection, in addition it can be one section of text There is the case where abnormal text after the completion of processing work and in abnormal text collection, it is special that this illustrative embodiment does not do this It limits.Abnormal text prompt information can be any information that can play prompt or warning function, such as can be to correlation The alarm mail that business personnel sends.Abnormal text prompt information may include the content of abnormal text, also may include exception The storing path of text.It is some in the related technology, often detect that an abnormal text will send an abnormal text prompt Information, therefore there are problems that prompting excessively frequently, prompt information redundancy, influence user experience.And the present embodiment passes through setting Preset condition can control the transmission frequency and quantity of abnormal text prompt information well, optimize user experience.
More preferably, after sending abnormal text prompt information, text handling method provided in this embodiment may be used also With comprising steps of analyzing the abnormal text in abnormal text collection, the abnormal mark of acquisition is to form abnormal logo collection.This step Abnormal text can be analyzed, find unusual part present in the exception text, obtain abnormal mark therefrom to be formed Abnormal logo collection.When it is subsequent get new abnormal mark after, can be supplemented to again in abnormal logo collection.Abnormal mark Support can be provided for the abnormal label detection carried out in step S110 by knowing set.With abnormal logo collection enrich constantly and Perfect, the detection and cleaning identified extremely also will more thoroughly, and the detection of abnormal text will be reduced constantly, the efficiency of text-processing Also it will greatly improve.
Refering to what is shown in Fig. 4, step S331. carries out text to be processed in the another exemplary embodiment of the disclosure Off-note detection, to judge whether text to be processed is that normal text or abnormal text may further include following step It is rapid:
Whether step S3311. detects in text to be processed comprising continuous non-Chinese Fields.
Whether this step is detected first comprising continuous non-Chinese Fields in text to be processed, wherein continuous non-Chinese Fields can With the continuation field being made of non-Chinese characters such as number, English or spcial characters.
If step S3312. is detected in text to be processed comprising continuous non-Chinese Fields, judge text to be processed to be different Chang Wenben.
If step S3311 detects that, comprising continuous non-Chinese Fields in text to be processed, this step may determine that Text to be processed is abnormal text.
If step S3313. is not detected in text to be processed comprising continuous non-Chinese Fields, judge that text to be processed is Normal text.
If step S3311 is not detected in text to be processed comprising continuous non-Chinese Fields, this step can be with Judge text to be processed for normal text.
Further, restriction can also be made to the length of continuous non-Chinese Fields in the present embodiment, such as length surpasses The continuous non-Chinese Fields for crossing 10 characters can be regarded as continuous non-Chinese Fields, and be lower than the word of 10 characters for length The continuous non-Chinese Fields of section can then be regarded as normal the text field.
The present embodiment carries out off-note detection using continuous non-Chinese Fields, is adapted to the language of Chinese medical text Feature expeditiously completes the detection of abnormal text.
It should be noted that, although foregoing exemplary embodiment describes each of method in the disclosure with particular order Step, still, this does not require that perhaps hint must execute these steps in this particular order or have to carry out whole The step of be just able to achieve desired result.Additionally or alternatively, it is convenient to omit multiple steps are merged into one by certain steps A step executes, and/or a step is decomposed into execution of multiple steps etc..
In the illustrative embodiments of the disclosure, a kind of text processing apparatus is also provided, refering to what is shown in Fig. 5, at text Managing device 50 mainly may include: detection module 51, cleaning modul 52 and processing module 53.Wherein, detection module 51 is configured Whether to detect in text to be processed comprising abnormal mark.It is measured in the text to be processed if cleaning modul 52 is configured as school Comprising abnormal mark, then to the part progress text cleaning in the text to be processed comprising identifying extremely.53 quilt of processing module It is configured to carry out structuring processing to the text to be processed to obtain structural data.
The detail of above-mentioned text processing apparatus is described in detail in corresponding text handling method, Therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In the illustrative embodiments of the disclosure, a kind of computer readable storage medium is also provided, is stored thereon with meter Calculation machine program can realize the above-mentioned text handling method of the disclosure when computer program is executed by processor.Some In possible embodiment, various aspects of the disclosure is also implemented as a kind of form of program product comprising program generation Code;The program product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk or mobile hard disk etc.) Or on network;When described program product (can be personal computer, server, terminal installation or net in a calculating equipment Network equipment etc.) on when running, said program code is for making above-mentioned each exemplary implementation in the calculatings equipment execution disclosure Method and step in example.
It is shown in Figure 6, it, can be with according to the program product 60 for realizing the above method of embodiment of the present disclosure Using portable compact disc read-only memory (CD-ROM) and including program code, and can be to calculate equipment (such as personal Computer, server, terminal installation or network equipment etc.) on run.However, the program product of the disclosure is without being limited thereto.? In the present exemplary embodiment, computer readable storage medium can be any tangible medium for including or store program, the program Execution system, device or device use or in connection can be commanded.
Described program product can use any combination of one or more readable medium.Readable medium can be readable Signal media or readable storage medium storing program for executing.
Readable storage medium storing program for executing for example can be but be not limited to the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device Or device or any above combination.The more specific example (non exhaustive list) of readable storage medium storing program for executing includes: with one The electrical connection of a or multiple conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable type Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical memory Part, magnetic memory device or above-mentioned any appropriate combination.
Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying Readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal Or above-mentioned any appropriate combination.Readable signal medium can also be any readable medium other than readable storage medium storing program for executing, should Readable medium can send, propagate or transmit for by instruction execution system, device or device use or it is in connection The program used.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program Code, described program design language include object oriented program language, Java, C++ etc., further include conventional mistake Formula programming language, such as C language or similar programming language.Program code can be calculated fully in user and be set Standby upper execution is partly executed on the user computing device, is set as an independent software package execution, partially in user's calculating Standby upper part executes on a remote computing or executes in remote computing device or server completely.It is being related to remotely In the situation for calculating equipment, remote computing device can pass through the network of any kind (including local area network (LAN) or wide area network (WAN) etc.) it is connected to user calculating equipment;Or, it may be connected to external computing device, such as provided using Internet service Quotient is connected by internet.
In the illustrative embodiments of the disclosure, also offer a kind of electronic equipment, the electronic equipment include at least one A processor and at least one be used for store the processor executable instruction memory;Wherein, the processor quilt It is configured to execute the method and step in the disclosure in above-mentioned each exemplary embodiment via the executable instruction is executed.
The electronic equipment 700 in this illustrative embodiment is described below with reference to Fig. 7.Electronic equipment 700 is only For an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
Shown in Figure 7, electronic equipment 700 is showed in the form of universal computing device.The component of electronic equipment 700 can be with Including but not limited to: at least one processing unit 710, at least one storage unit 720, the different system components of connection (including place Manage unit 710 and storage unit 720) bus 730, display unit 740.
Wherein, storage unit 720 is stored with program code, and said program code can be executed with unit 710 processed, so that Processing unit 710 executes the method and step in the disclosure in above-mentioned each exemplary embodiment.
Storage unit 720 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit 721 (RAM) and/or cache memory unit 722 can further include read-only memory unit 723 (ROM).
Storage unit 720 can also include program/utility 724 with one group of (at least one) program module 725, Such program module includes but is not limited to: operating system, one or more application program, other program modules and program It may include the realization of network environment in data, each of these examples or certain combination.
Bus 730 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in various bus structures Local bus.
Electronic equipment 700 can also be with one or more external equipments 800 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, the equipment communication that user can also be allowed to interact with the electronic equipment 700 with one or more, and/or with The electronic equipment 700 and one or more other are enabled to calculate any equipment that equipment are communicated (such as router, modulation Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 750.Also, electronic equipment 700 may be used also To pass through network adapter 760 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network Network, such as internet) communication.As shown in fig. 7, network adapter 760 can be by other of bus 730 and electronic equipment 700 Module communication.It should be understood that although not shown in the drawings, other hardware and/or software mould can be used in conjunction with electronic equipment 700 Block, including but not limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape Driver and data backup storage system etc..
It will be appreciated by those skilled in the art that various aspects of the disclosure can be implemented as system, method or program product. Therefore, various aspects of the disclosure can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete software The embodiment that embodiment (including firmware, microcode etc.) or hardware and software combine, may be collectively referred to as here " circuit ", " module " or " system ".
It explains below with reference to an application scenarios to disclosed embodiment.Refering to what is shown in Fig. 8, text in the application scenarios The main flow of processing is as follows:
1, due to a number of segment word, English or spcial character that abnormal medical text is all overlength under normal circumstances.Therefore it is first Advanced this length detection of style of writing, when the length of medical text is greater than some threshold value, then it is assumed that the text may be abnormal Text.Normal text is then thought less than threshold value, into normal structured process.
2, whether second step cleans doubtful abnormal text, check in doubtful abnormal text comprising abnormal mark, packet Containing circulation cleaning is then carried out, i.e., inspection cleaning is repeatedly carried out, until checking in text less than abnormal mark, then it is assumed that cleared up At.
3, real off-note detection is carried out to the doubtful abnormal text after cleaning, to the doubtful abnormal text after cleaning Continuous non-Chinese character inspection is carried out, if the doubtful abnormal text inspection after the cleaning is abnormal text, enters abnormal text set In conjunction, if normal text, then normal structuring processing is carried out.
4, finally all texts carry out structuring completion, are written in disk simultaneously record path to abnormal text collection, so Hair mail alarms abnormal textual portions content and abnormal text path to relevant person in charge afterwards.
5, relevant person in charge checks corresponding abnormal text, takes out corresponding abnormal mark and enters abnormal logo collection In.The feature of cleaning is provided for doubtful abnormal text.
In the application scenarios, mainly to increasing the functions such as length detection and data cleansing in abnormal text detection, Centralized alarm method is used on abnormal text alarm mechanism, the user experience is improved.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by appended Claim is pointed out.
Above-mentioned described feature, structure or characteristic can be incorporated in one or more embodiment party in any suitable manner In formula, if possible, it is characterized in discussed in each embodiment interchangeable.In the above description, it provides many specific thin Section fully understands embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that this can be practiced Disclosed technical solution, or can be using other methods, component, material without one or more in specific detail Deng.In other cases, known features, material or operation are not shown in detail or describe to avoid each side of the fuzzy disclosure Face.

Claims (10)

1. a kind of text handling method characterized by comprising
It whether detects in text to be processed comprising abnormal mark;
If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;
Structuring processing is carried out to obtain structural data to the text to be processed.
2. exception text handling method according to claim 1, which is characterized in that in detection text to be processed whether Include: comprising abnormal mark
The length of text to be processed is detected, and judges whether the length is greater than preset threshold;
Whether if judging, the length is greater than preset threshold, detect in the text to be processed comprising abnormal mark.
3. exception text handling method according to claim 2, which is characterized in that in detection text to be processed whether Include abnormal mark further include:
If judging, the length is less than or equal to preset threshold, carries out structuring processing to the text to be processed to be tied Structure data.
4. text handling method according to claim 1, which is characterized in that described to carry out structure to the text to be processed Changing processing to obtain structural data includes:
Off-note detection is carried out to the text to be processed, to judge whether the text to be processed is normal text or different Chang Wenben;
If judging the text to be processed for normal text, structuring processing is carried out to obtain structure to the text to be processed Change data.
5. text handling method according to claim 4, which is characterized in that described to carry out exception to the text to be processed Feature detection, to judge whether the text to be processed is normal text or abnormal text, comprising:
It whether detects in the text to be processed comprising continuous non-Chinese Fields;
If detecting in the text to be processed comprising continuous non-Chinese Fields, judge that the text to be processed is literary for exception This;
If being not detected in the text to be processed comprising continuous non-Chinese Fields, judge the text to be processed for normal text This.
6. text handling method according to claim 4, which is characterized in that described to carry out structure to the text to be processed Change processing to obtain structural data further include:
If judging the text to be processed for abnormal text, the abnormal text is directed into abnormal text collection;
When the abnormal text collection meets preset condition, abnormal text prompt information is sent.
7. text handling method according to claim 6, which is characterized in that after sending abnormal text prompt information, The method also includes:
The abnormal text in the abnormal text collection is analyzed, obtains abnormal mark to form abnormal logo collection.
8. a kind of text processing apparatus characterized by comprising
Whether detection module is configured as detecting in text to be processed comprising abnormal mark;
Cleaning modul measures if being configured as school comprising abnormal mark in the text to be processed, to the text to be processed In comprising the part that identifies extremely carry out text cleaning;
Processing module is configured as carrying out structuring processing to the text to be processed to obtain structural data.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Text handling method described in any one of claim 1-7 is realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to carrying out any one of perform claim requirement 1-7 via the executable instruction is executed The text handling method.
CN201811413346.9A 2018-11-23 2018-11-23 Text processing method and device, storage medium and electronic equipment Active CN109284483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811413346.9A CN109284483B (en) 2018-11-23 2018-11-23 Text processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811413346.9A CN109284483B (en) 2018-11-23 2018-11-23 Text processing method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109284483A true CN109284483A (en) 2019-01-29
CN109284483B CN109284483B (en) 2023-06-30

Family

ID=65172631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811413346.9A Active CN109284483B (en) 2018-11-23 2018-11-23 Text processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109284483B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112114987A (en) * 2019-06-20 2020-12-22 腾讯科技(深圳)有限公司 Method and device for detecting abnormity of operating environment, intelligent terminal and storage medium
WO2021032055A1 (en) * 2019-08-19 2021-02-25 金色熊猫有限公司 Automatic entry method and device for clinical trial reports, electronic equipment, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260357A (en) * 2015-10-14 2016-01-20 北京京东尚科信息技术有限公司 Sensitive word checking method and device based on Hash sensitive words directed graph
US20160253313A1 (en) * 2015-02-27 2016-09-01 Nuance Communications, Inc. Updating language databases using crowd-sourced input
CN106445915A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 New word discovery method and device
CN107657060A (en) * 2017-10-20 2018-02-02 中电科新型智慧城市研究院有限公司 A kind of characteristic optimization method based on semi-structured text classification
CN108228851A (en) * 2018-01-10 2018-06-29 北京奇艺世纪科技有限公司 A kind of lists of keywords method of adjustment, device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160253313A1 (en) * 2015-02-27 2016-09-01 Nuance Communications, Inc. Updating language databases using crowd-sourced input
CN105260357A (en) * 2015-10-14 2016-01-20 北京京东尚科信息技术有限公司 Sensitive word checking method and device based on Hash sensitive words directed graph
CN106445915A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 New word discovery method and device
CN107657060A (en) * 2017-10-20 2018-02-02 中电科新型智慧城市研究院有限公司 A kind of characteristic optimization method based on semi-structured text classification
CN108228851A (en) * 2018-01-10 2018-06-29 北京奇艺世纪科技有限公司 A kind of lists of keywords method of adjustment, device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112114987A (en) * 2019-06-20 2020-12-22 腾讯科技(深圳)有限公司 Method and device for detecting abnormity of operating environment, intelligent terminal and storage medium
CN112114987B (en) * 2019-06-20 2024-04-09 腾讯科技(深圳)有限公司 Abnormality detection method and device for operation environment, intelligent terminal and storage medium
WO2021032055A1 (en) * 2019-08-19 2021-02-25 金色熊猫有限公司 Automatic entry method and device for clinical trial reports, electronic equipment, and storage medium

Also Published As

Publication number Publication date
CN109284483B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
US8370278B2 (en) Ontological categorization of question concepts from document summaries
US20190121842A1 (en) Content adjustment and display augmentation for communication
US10067983B2 (en) Analyzing tickets using discourse cues in communication logs
CN109408829B (en) Method, device, equipment and medium for determining readability of article
CN108416216A (en) leak detection method, device and computing device
CN108776696B (en) Node configuration method and device, storage medium and electronic equipment
CN111316232A (en) Providing optimization using annotations of programs
CN114328208A (en) Code detection method and device, electronic equipment and storage medium
US11093712B2 (en) User interfaces for word processors
US20230078134A1 (en) Classification of erroneous cell data
CN111339768B (en) Sensitive text detection method, system, electronic equipment and medium
CN109284483A (en) Text handling method, device, storage medium and electronic equipment
CN110647523B (en) Data quality analysis method and device, storage medium and electronic equipment
CN115576828A (en) Test case generation method, device, equipment and storage medium
Di Sorbo et al. An nlp-based tool for software artifacts analysis
CN109710523B (en) Visual draft test case generation method and device, storage medium and electronic equipment
Fischbach et al. Cira: A tool for the automatic detection of causal relationships in requirements artifacts
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
CN115687651A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN114925757A (en) Multi-source threat intelligence fusion method, device, equipment and storage medium
US20220277176A1 (en) Log classification using machine learning
CN110427330B (en) Code analysis method and related device
CN113900956A (en) Test case generation method and device, computer equipment and storage medium
WO2018060777A1 (en) Method and system for optimizing software testing
Komninos et al. Mobile text entry behaviour in lab and in-the-wild studies: is it different?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant