CN109284483A - Text handling method, device, storage medium and electronic equipment - Google Patents
Text handling method, device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN109284483A CN109284483A CN201811413346.9A CN201811413346A CN109284483A CN 109284483 A CN109284483 A CN 109284483A CN 201811413346 A CN201811413346 A CN 201811413346A CN 109284483 A CN109284483 A CN 109284483A
- Authority
- CN
- China
- Prior art keywords
- text
- processed
- abnormal
- handling method
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
This disclosure relates to a kind of text handling method, text processing apparatus, computer readable storage medium and electronic equipment.Whether the text handling method that the embodiment of the present disclosure provides includes: in detection text to be processed comprising abnormal mark;If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;Structuring processing is carried out to obtain structural data to the text to be processed.Text handling method provided by the embodiment of the present disclosure can greatly retain the valid data in text to be processed, avoid the problem that loss of data.
Description
Technical field
This disclosure relates to field of computer technology, and in particular to a kind of text handling method, text processing apparatus, computer
Readable storage medium storing program for executing and electronic equipment.
Background technique
Structured techniques are one of NLP (Natural Language Processing, i.e. natural language processing) weights
The technology wanted, the structuring of text are contents required for extracting from natural language text, form structural data.Among these
It inherently uses the tools such as canonical and dictionary and removes the structural data needed obtained by matching.In one section of normal medical treatment text of China
Should be all be largely Chinese character, wherein a small amount of number, letter or spcial character can be adulterated.If one section of text
It is middle a large amount of number, English alphabet or exception symbol occur, it may be considered that there is exception in this section of text.When to exception
When medical data text carries out structuring processing, since canonical is greedy mode, on the one hand consumption can be matched in canonical
Very more resource, on the other hand can generate the data object of very more (may be up to up to a million), this will make operation system
No enough resources of uniting go to handle, and cause load very high, and the time that this section of abnormal text is spent may take several days not
It can calculate and complete.It therefore is an important technology to the detection of the medical text of real exception and cleaning technique, and bad assurance.
Mainly include at present following two to the method for medical abnormal text inspection and alarm:
The first is abnormal matching, checks whether occur continuous multiple numbers, English character or spy in medical text
Different character, appearance are then considered abnormal text, the exception text are just abandoned after alarm.
It is a time out detection second, the mechanism of setting time-out is examined when by carrying out structuring to medical text
It looks into, when can only take some time when generally normally medical text executes structuring, therefore be spent when executing structuring
Between reach certain threshold value, then it is assumed that text is abnormal, after just abandon the exception text.
Both the above method is very single for the judgement dimension of abnormal text, when carrying out the abnormal examination of medical text
Discarding can be largely subjected to the useful data of structuring, thus will lead to the problem of serious normal data is lost.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part
Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of text handling method, text processing apparatus, computer readable storage medium
And electronic equipment, and then overcome loss of data caused by the limitation and defect due to the relevant technologies tight at least to a certain extent
The technical problem of weight.
According to one aspect of the disclosure, a kind of text handling method is provided, is characterized in that, comprising:
It whether detects in text to be processed comprising abnormal mark;
If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;
Structuring processing is carried out to obtain structural data to the text to be processed.
In a kind of illustrative embodiments of the disclosure, whether include abnormal mark packet in the detection text to be processed
It includes:
The length of text to be processed is detected, and judges whether the length is greater than preset threshold;
Whether if judging, the length is greater than preset threshold, detect in the text to be processed comprising abnormal mark.
In a kind of illustrative embodiments of the disclosure, whether gone back comprising abnormal mark in the detection text to be processed
Include:
If judging, the length is less than or equal to preset threshold, carries out structuring processing to the text to be processed to obtain
To structural data.
It is described that structuring processing is carried out to obtain to the text to be processed in a kind of illustrative embodiments of the disclosure
Include: to structural data
Off-note detection is carried out to the text to be processed, with judge the text to be processed whether be normal text or
Person's exception text;
If judging the text to be processed for normal text, structuring processing is carried out to obtain to the text to be processed
Structural data.
It is described that off-note detection is carried out to the text to be processed in a kind of illustrative embodiments of the disclosure,
To judge whether the text to be processed is normal text or abnormal text, comprising:
It whether detects in the text to be processed comprising continuous non-Chinese Fields;
If detecting in the text to be processed comprising continuous non-Chinese Fields, judge the text to be processed for exception
Text;
If being not detected in the text to be processed comprising continuous non-Chinese Fields, judge that the text to be processed is positive
Chang Wenben.
It is described that structuring processing is carried out to obtain to the text to be processed in a kind of illustrative embodiments of the disclosure
To structural data further include:
If judging the text to be processed for abnormal text, the abnormal text is directed into abnormal text collection;
When the abnormal text collection meets preset condition, abnormal text prompt information is sent.
In a kind of illustrative embodiments of the disclosure, after sending abnormal text prompt information, the method is also
Include:
The abnormal text in the abnormal text collection is analyzed, obtains abnormal mark to form abnormal logo collection.
According to one aspect of the disclosure, a kind of text processing apparatus is provided, is characterized in that, comprising:
Whether detection module is configured as detecting in text to be processed comprising abnormal mark;
Cleaning modul measures if being configured as school comprising abnormal mark in the text to be processed, to described to be processed
Part in text comprising identifying extremely carries out text cleaning;
Processing module is configured as carrying out structuring processing to the text to be processed to obtain structural data.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with,
It is characterized in that, the text handling method of any description above is realized when the computer program is executed by processor.
According to one aspect of the disclosure, a kind of electronic equipment is provided, is characterized in that, including processor and storage
Device;Wherein, memory is used to store the executable instruction of the processor, the processor is configured to via can described in execution
It executes instruction to execute the text handling method of any description above.
Text handling method provided by the embodiment of the present disclosure can be with by carrying out abnormal label detection to text to be processed
Unusual part in text is cleared up rather than is simply abandoned in full, therefore can greatly be retained in text to be processed
Valid data, avoid the problem that loss of data.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 schematically shows the step flow chart of text handling method in a kind of illustrative embodiments of the disclosure.
Fig. 2 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 3 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 4 schematically shows the step flow chart of text handling method in disclosure another kind illustrative embodiments.
Fig. 5 schematically shows the composition block diagram of text processing apparatus in disclosure illustrative embodiments.
Fig. 6 schematically shows a kind of schematic diagram of program product in disclosure illustrative embodiments.
Fig. 7 schematically shows the module diagram of a kind of electronic equipment in disclosure illustrative embodiments.
Fig. 8 schematically shows a kind of text handling method process of the disclosure illustrative embodiments in application scenarios
Figure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms
It applies, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more comprehensively and
Completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, structure or characteristic
It can be incorporated in any suitable manner in one or more embodiments.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function
Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
A kind of text handling method is provided first in the illustrative embodiments of the disclosure, mainly can be applied to medical text
This structuring processing, to obtain structural data from medical text.Wherein, medical text may include patient medical history,
The various e-texts comprising medical data such as inpatient cases.Refering to what is shown in Fig. 1, text handling method master provided in this embodiment
It may comprise steps of:
Whether step S110. detects in text to be processed comprising abnormal mark.
This step first detects the content of text of text to be processed, and whether judgement includes wherein abnormal mark,
Wherein abnormal mark may include the messy code occurred in text, meaningless string number, letter or spcial character etc..In order to
Abnormal label detection is preferably carried out, abnormal label detection rule can be formulated, may determine that if meeting the rule as exception
Mark;In addition abnormal logo collection can also be preset, using mark is treated extremely included in the exception logo collection
It handles text and carries out matching detection, matched text can judge the matched text for abnormal mark if detecting.
If step S120. is detected comprising abnormal mark in text to be processed, text cleaning is carried out to abnormal identify.
According to the testing result of step S110, if detecting that this step will be right comprising abnormal mark in text to be processed
Mark carries out text cleaning extremely for this.The purpose of text cleaning is so that no longer comprising abnormal mark in text to be processed.In order to
Cleaning effect is improved, the present embodiment can carry out circulation cleaning to the abnormal mark in text to be processed, i.e., repeatedly be examined
It surveys and text cleaning can be thought to clear up and complete until can't detect abnormal mark in text to be processed.Wherein, to abnormal mark
The mode for knowing progress text cleaning can be direct deletion, be also possible to be replaced with specific identifiable marker, either
Any other text-processing mode, this illustrative embodiment do not do particular determination to this.
Step S130. carries out structuring processing to text to be processed to obtain structural data.
After step S120 carries out text cleaning, this step will carry out structure to without the text to be processed identified extremely
Change processing to obtain structural data.Wherein the general process of structuring processing, which can be, first segments text to be processed,
Then information extraction is carried out to the text after participle, finally obtains structural data.Structuring processing is carried out to text to be processed
It can use some data processing platform (DPP)s or computing engines, such as Apache Spark.
Text handling method provided by this illustrative embodiment is by carrying out abnormal label detection to text to be processed
Unusual part in text can be cleared up rather than simply be abandoned in full, therefore can greatly retain text to be processed
Valid data in this, avoid the problem that loss of data is serious.
On the basis of foregoing exemplary embodiment, another embodiment of the present disclosure provides a kind of text handling method,
Whether it may include following steps as shown in Figure 2 comprising abnormal mark that wherein step S110. is detected in text to be processed:
Step S211. detects the length of text to be processed, and judges whether length is greater than preset threshold.
Since abnormal medical text is all a number of segment word, English or the spcial character of overlength under normal circumstances,
Extremely before the detection and cleaning that are identified, this step carries out text size detection first, and judges that the text detected is long
Whether degree is greater than preset threshold.Wherein preset threshold can be directly related with the information such as the source of text to be processed, type, in addition
Can also statistically analyze to obtain according to the historical data of text-processing, such as can be 2000 characters, 3000 characters etc., originally show
Example property embodiment does not do particular determination to this.
If step S212. judges that length is greater than preset threshold, whether detect in text to be processed comprising abnormal mark.
According to the testing result of step S211, if it is determined that the length of text to be processed is greater than preset threshold, then can be with
Think that the text to be processed is particularly likely that abnormal text, whether then can continue to test in text to be processed comprising abnormal
Mark, and then continue the text identified extremely cleaning and subsequent structuring processing.
On the basis of the embodiment, whether step S110. is detected in text to be processed can be with comprising abnormal mark
It further includes steps of
If step S213. judge length be less than or equal to preset threshold, to text to be processed carry out structuring processing with
Obtain structural data.
It, can be with if the length that the testing result of step S211 is text to be processed is less than or equal to preset threshold
Think that the text to be processed is normal text, be directly entered in normal structured process, carries out structuring processing to it to obtain
To structural data.
Text handling method provided in this embodiment can use minimum meter by carrying out length detection to text to be processed
It is counted as originally identifying that doubtful abnormal text avoids the wasting of resources to improve the efficiency of text-processing.
On the basis of foregoing exemplary embodiment, another embodiment of the present disclosure provides a kind of text handling method,
Wherein step S130. to text to be processed carry out structuring processing with obtain structural data may include it is as shown in Figure 3 with
Lower step:
Step S331. carries out off-note detection to text to be processed, to judge whether text to be processed is normal text
Or abnormal text.
By after the detection and cleaning that identify extremely, this step will carry out off-note detection to text to be processed, with
More accurately judge whether text to be processed is normal text or abnormal text.Similarly with abnormal mark, in this step
Off-note for detection also may include the messy code occurred in text, meaningless string number, letter or special word
Symbol etc..In other words, the off-note detection and the abnormal label detection that carries out in step S110 carried out in this step some
Detection content is relevant.In addition, the off-note in this step for detection can also include some other according to history number
According to generating or preset feature, it is particularly possible to including the unusual part that is not detected in some abnormal label detections or
The unusual part for being detected but not being cleaned.The reason of unusual part is not detected in abnormal label detection may include
It is a variety of, for instance it can be possible that abnormal logo collection is incomplete, it is also possible to occur omitting in a large amount of text-processings.Unusual part
The reason of having been detected by but not being cleaned may also be including a variety of, for instance it can be possible that being provided with right of compilation in text to be processed
It limits and leads to unusual part not and allow to be deleted or replace etc..In addition, before carrying out the off-note detection of this step, it can
First to carry out the detection of text size to text to be processed, if text size is more than preset threshold, then this step is executed
Off-note detection.And if text size is not above preset threshold, this step can be skipped and directly to this
Text to be processed carries out structuring processing, to save calculation resources, improves text-processing efficiency.
If step S332. judges text to be processed for normal text, structuring processing is carried out to obtain to text to be processed
To structural data.
After the detection of step S331 and judgement, if a determination be made that text to be processed is normal text, then this
Step carries out structuring processing to text to be processed again to obtain structural data.
On the basis of the embodiment, step S130. carries out structuring processing to text to be processed to obtain structuring
Data can further include following steps:
If step S333. judges that abnormal text for abnormal text, is directed into abnormal text collection by text to be processed.
After the detection of step S331 and judgement, if a determination be made that text to be processed is abnormal text, then this
Abnormal text will be directed into abnormal text collection by step.The abnormal text being directed into abnormal text collection will not tied temporarily
Structureization processing, in order to avoid influence the structuring treatment progress of normal text.
Step S334. sends abnormal text prompt information when abnormal text collection meets preset condition.
According to step S333, if constantly there is abnormal text to be detected, will also be continued in abnormal text collection
It is constantly imported into abnormal text, and if abnormal text collection meets preset condition, this step will send abnormal text
Prompt information.Wherein, it is more than a certain preset threshold that preset condition, which can be the abnormal amount of text in abnormal text collection, can also
To be to reach sometime node from importing first abnormal text in abnormal text collection, in addition it can be one section of text
There is the case where abnormal text after the completion of processing work and in abnormal text collection, it is special that this illustrative embodiment does not do this
It limits.Abnormal text prompt information can be any information that can play prompt or warning function, such as can be to correlation
The alarm mail that business personnel sends.Abnormal text prompt information may include the content of abnormal text, also may include exception
The storing path of text.It is some in the related technology, often detect that an abnormal text will send an abnormal text prompt
Information, therefore there are problems that prompting excessively frequently, prompt information redundancy, influence user experience.And the present embodiment passes through setting
Preset condition can control the transmission frequency and quantity of abnormal text prompt information well, optimize user experience.
More preferably, after sending abnormal text prompt information, text handling method provided in this embodiment may be used also
With comprising steps of analyzing the abnormal text in abnormal text collection, the abnormal mark of acquisition is to form abnormal logo collection.This step
Abnormal text can be analyzed, find unusual part present in the exception text, obtain abnormal mark therefrom to be formed
Abnormal logo collection.When it is subsequent get new abnormal mark after, can be supplemented to again in abnormal logo collection.Abnormal mark
Support can be provided for the abnormal label detection carried out in step S110 by knowing set.With abnormal logo collection enrich constantly and
Perfect, the detection and cleaning identified extremely also will more thoroughly, and the detection of abnormal text will be reduced constantly, the efficiency of text-processing
Also it will greatly improve.
Refering to what is shown in Fig. 4, step S331. carries out text to be processed in the another exemplary embodiment of the disclosure
Off-note detection, to judge whether text to be processed is that normal text or abnormal text may further include following step
It is rapid:
Whether step S3311. detects in text to be processed comprising continuous non-Chinese Fields.
Whether this step is detected first comprising continuous non-Chinese Fields in text to be processed, wherein continuous non-Chinese Fields can
With the continuation field being made of non-Chinese characters such as number, English or spcial characters.
If step S3312. is detected in text to be processed comprising continuous non-Chinese Fields, judge text to be processed to be different
Chang Wenben.
If step S3311 detects that, comprising continuous non-Chinese Fields in text to be processed, this step may determine that
Text to be processed is abnormal text.
If step S3313. is not detected in text to be processed comprising continuous non-Chinese Fields, judge that text to be processed is
Normal text.
If step S3311 is not detected in text to be processed comprising continuous non-Chinese Fields, this step can be with
Judge text to be processed for normal text.
Further, restriction can also be made to the length of continuous non-Chinese Fields in the present embodiment, such as length surpasses
The continuous non-Chinese Fields for crossing 10 characters can be regarded as continuous non-Chinese Fields, and be lower than the word of 10 characters for length
The continuous non-Chinese Fields of section can then be regarded as normal the text field.
The present embodiment carries out off-note detection using continuous non-Chinese Fields, is adapted to the language of Chinese medical text
Feature expeditiously completes the detection of abnormal text.
It should be noted that, although foregoing exemplary embodiment describes each of method in the disclosure with particular order
Step, still, this does not require that perhaps hint must execute these steps in this particular order or have to carry out whole
The step of be just able to achieve desired result.Additionally or alternatively, it is convenient to omit multiple steps are merged into one by certain steps
A step executes, and/or a step is decomposed into execution of multiple steps etc..
In the illustrative embodiments of the disclosure, a kind of text processing apparatus is also provided, refering to what is shown in Fig. 5, at text
Managing device 50 mainly may include: detection module 51, cleaning modul 52 and processing module 53.Wherein, detection module 51 is configured
Whether to detect in text to be processed comprising abnormal mark.It is measured in the text to be processed if cleaning modul 52 is configured as school
Comprising abnormal mark, then to the part progress text cleaning in the text to be processed comprising identifying extremely.53 quilt of processing module
It is configured to carry out structuring processing to the text to be processed to obtain structural data.
The detail of above-mentioned text processing apparatus is described in detail in corresponding text handling method,
Therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In the illustrative embodiments of the disclosure, a kind of computer readable storage medium is also provided, is stored thereon with meter
Calculation machine program can realize the above-mentioned text handling method of the disclosure when computer program is executed by processor.Some
In possible embodiment, various aspects of the disclosure is also implemented as a kind of form of program product comprising program generation
Code;The program product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk or mobile hard disk etc.)
Or on network;When described program product (can be personal computer, server, terminal installation or net in a calculating equipment
Network equipment etc.) on when running, said program code is for making above-mentioned each exemplary implementation in the calculatings equipment execution disclosure
Method and step in example.
It is shown in Figure 6, it, can be with according to the program product 60 for realizing the above method of embodiment of the present disclosure
Using portable compact disc read-only memory (CD-ROM) and including program code, and can be to calculate equipment (such as personal
Computer, server, terminal installation or network equipment etc.) on run.However, the program product of the disclosure is without being limited thereto.?
In the present exemplary embodiment, computer readable storage medium can be any tangible medium for including or store program, the program
Execution system, device or device use or in connection can be commanded.
Described program product can use any combination of one or more readable medium.Readable medium can be readable
Signal media or readable storage medium storing program for executing.
Readable storage medium storing program for executing for example can be but be not limited to the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device
Or device or any above combination.The more specific example (non exhaustive list) of readable storage medium storing program for executing includes: with one
The electrical connection of a or multiple conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable type
Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical memory
Part, magnetic memory device or above-mentioned any appropriate combination.
Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying
Readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal
Or above-mentioned any appropriate combination.Readable signal medium can also be any readable medium other than readable storage medium storing program for executing, should
Readable medium can send, propagate or transmit for by instruction execution system, device or device use or it is in connection
The program used.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have
Line, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program
Code, described program design language include object oriented program language, Java, C++ etc., further include conventional mistake
Formula programming language, such as C language or similar programming language.Program code can be calculated fully in user and be set
Standby upper execution is partly executed on the user computing device, is set as an independent software package execution, partially in user's calculating
Standby upper part executes on a remote computing or executes in remote computing device or server completely.It is being related to remotely
In the situation for calculating equipment, remote computing device can pass through the network of any kind (including local area network (LAN) or wide area network
(WAN) etc.) it is connected to user calculating equipment;Or, it may be connected to external computing device, such as provided using Internet service
Quotient is connected by internet.
In the illustrative embodiments of the disclosure, also offer a kind of electronic equipment, the electronic equipment include at least one
A processor and at least one be used for store the processor executable instruction memory;Wherein, the processor quilt
It is configured to execute the method and step in the disclosure in above-mentioned each exemplary embodiment via the executable instruction is executed.
The electronic equipment 700 in this illustrative embodiment is described below with reference to Fig. 7.Electronic equipment 700 is only
For an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
Shown in Figure 7, electronic equipment 700 is showed in the form of universal computing device.The component of electronic equipment 700 can be with
Including but not limited to: at least one processing unit 710, at least one storage unit 720, the different system components of connection (including place
Manage unit 710 and storage unit 720) bus 730, display unit 740.
Wherein, storage unit 720 is stored with program code, and said program code can be executed with unit 710 processed, so that
Processing unit 710 executes the method and step in the disclosure in above-mentioned each exemplary embodiment.
Storage unit 720 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit
721 (RAM) and/or cache memory unit 722 can further include read-only memory unit 723 (ROM).
Storage unit 720 can also include program/utility 724 with one group of (at least one) program module 725,
Such program module includes but is not limited to: operating system, one or more application program, other program modules and program
It may include the realization of network environment in data, each of these examples or certain combination.
Bus 730 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in various bus structures
Local bus.
Electronic equipment 700 can also be with one or more external equipments 800 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, the equipment communication that user can also be allowed to interact with the electronic equipment 700 with one or more, and/or with
The electronic equipment 700 and one or more other are enabled to calculate any equipment that equipment are communicated (such as router, modulation
Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 750.Also, electronic equipment 700 may be used also
To pass through network adapter 760 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network
Network, such as internet) communication.As shown in fig. 7, network adapter 760 can be by other of bus 730 and electronic equipment 700
Module communication.It should be understood that although not shown in the drawings, other hardware and/or software mould can be used in conjunction with electronic equipment 700
Block, including but not limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape
Driver and data backup storage system etc..
It will be appreciated by those skilled in the art that various aspects of the disclosure can be implemented as system, method or program product.
Therefore, various aspects of the disclosure can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete software
The embodiment that embodiment (including firmware, microcode etc.) or hardware and software combine, may be collectively referred to as here " circuit ",
" module " or " system ".
It explains below with reference to an application scenarios to disclosed embodiment.Refering to what is shown in Fig. 8, text in the application scenarios
The main flow of processing is as follows:
1, due to a number of segment word, English or spcial character that abnormal medical text is all overlength under normal circumstances.Therefore it is first
Advanced this length detection of style of writing, when the length of medical text is greater than some threshold value, then it is assumed that the text may be abnormal
Text.Normal text is then thought less than threshold value, into normal structured process.
2, whether second step cleans doubtful abnormal text, check in doubtful abnormal text comprising abnormal mark, packet
Containing circulation cleaning is then carried out, i.e., inspection cleaning is repeatedly carried out, until checking in text less than abnormal mark, then it is assumed that cleared up
At.
3, real off-note detection is carried out to the doubtful abnormal text after cleaning, to the doubtful abnormal text after cleaning
Continuous non-Chinese character inspection is carried out, if the doubtful abnormal text inspection after the cleaning is abnormal text, enters abnormal text set
In conjunction, if normal text, then normal structuring processing is carried out.
4, finally all texts carry out structuring completion, are written in disk simultaneously record path to abnormal text collection, so
Hair mail alarms abnormal textual portions content and abnormal text path to relevant person in charge afterwards.
5, relevant person in charge checks corresponding abnormal text, takes out corresponding abnormal mark and enters abnormal logo collection
In.The feature of cleaning is provided for doubtful abnormal text.
In the application scenarios, mainly to increasing the functions such as length detection and data cleansing in abnormal text detection,
Centralized alarm method is used on abnormal text alarm mechanism, the user experience is improved.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by appended
Claim is pointed out.
Above-mentioned described feature, structure or characteristic can be incorporated in one or more embodiment party in any suitable manner
In formula, if possible, it is characterized in discussed in each embodiment interchangeable.In the above description, it provides many specific thin
Section fully understands embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that this can be practiced
Disclosed technical solution, or can be using other methods, component, material without one or more in specific detail
Deng.In other cases, known features, material or operation are not shown in detail or describe to avoid each side of the fuzzy disclosure
Face.
Claims (10)
1. a kind of text handling method characterized by comprising
It whether detects in text to be processed comprising abnormal mark;
If detecting comprising abnormal mark in the text to be processed, text cleaning is carried out to the abnormal mark;
Structuring processing is carried out to obtain structural data to the text to be processed.
2. exception text handling method according to claim 1, which is characterized in that in detection text to be processed whether
Include: comprising abnormal mark
The length of text to be processed is detected, and judges whether the length is greater than preset threshold;
Whether if judging, the length is greater than preset threshold, detect in the text to be processed comprising abnormal mark.
3. exception text handling method according to claim 2, which is characterized in that in detection text to be processed whether
Include abnormal mark further include:
If judging, the length is less than or equal to preset threshold, carries out structuring processing to the text to be processed to be tied
Structure data.
4. text handling method according to claim 1, which is characterized in that described to carry out structure to the text to be processed
Changing processing to obtain structural data includes:
Off-note detection is carried out to the text to be processed, to judge whether the text to be processed is normal text or different
Chang Wenben;
If judging the text to be processed for normal text, structuring processing is carried out to obtain structure to the text to be processed
Change data.
5. text handling method according to claim 4, which is characterized in that described to carry out exception to the text to be processed
Feature detection, to judge whether the text to be processed is normal text or abnormal text, comprising:
It whether detects in the text to be processed comprising continuous non-Chinese Fields;
If detecting in the text to be processed comprising continuous non-Chinese Fields, judge that the text to be processed is literary for exception
This;
If being not detected in the text to be processed comprising continuous non-Chinese Fields, judge the text to be processed for normal text
This.
6. text handling method according to claim 4, which is characterized in that described to carry out structure to the text to be processed
Change processing to obtain structural data further include:
If judging the text to be processed for abnormal text, the abnormal text is directed into abnormal text collection;
When the abnormal text collection meets preset condition, abnormal text prompt information is sent.
7. text handling method according to claim 6, which is characterized in that after sending abnormal text prompt information,
The method also includes:
The abnormal text in the abnormal text collection is analyzed, obtains abnormal mark to form abnormal logo collection.
8. a kind of text processing apparatus characterized by comprising
Whether detection module is configured as detecting in text to be processed comprising abnormal mark;
Cleaning modul measures if being configured as school comprising abnormal mark in the text to be processed, to the text to be processed
In comprising the part that identifies extremely carry out text cleaning;
Processing module is configured as carrying out structuring processing to the text to be processed to obtain structural data.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
Text handling method described in any one of claim 1-7 is realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to carrying out any one of perform claim requirement 1-7 via the executable instruction is executed
The text handling method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811413346.9A CN109284483B (en) | 2018-11-23 | 2018-11-23 | Text processing method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811413346.9A CN109284483B (en) | 2018-11-23 | 2018-11-23 | Text processing method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109284483A true CN109284483A (en) | 2019-01-29 |
CN109284483B CN109284483B (en) | 2023-06-30 |
Family
ID=65172631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811413346.9A Active CN109284483B (en) | 2018-11-23 | 2018-11-23 | Text processing method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284483B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112114987A (en) * | 2019-06-20 | 2020-12-22 | 腾讯科技(深圳)有限公司 | Method and device for detecting abnormity of operating environment, intelligent terminal and storage medium |
WO2021032055A1 (en) * | 2019-08-19 | 2021-02-25 | 金色熊猫有限公司 | Automatic entry method and device for clinical trial reports, electronic equipment, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260357A (en) * | 2015-10-14 | 2016-01-20 | 北京京东尚科信息技术有限公司 | Sensitive word checking method and device based on Hash sensitive words directed graph |
US20160253313A1 (en) * | 2015-02-27 | 2016-09-01 | Nuance Communications, Inc. | Updating language databases using crowd-sourced input |
CN106445915A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | New word discovery method and device |
CN107657060A (en) * | 2017-10-20 | 2018-02-02 | 中电科新型智慧城市研究院有限公司 | A kind of characteristic optimization method based on semi-structured text classification |
CN108228851A (en) * | 2018-01-10 | 2018-06-29 | 北京奇艺世纪科技有限公司 | A kind of lists of keywords method of adjustment, device and electronic equipment |
-
2018
- 2018-11-23 CN CN201811413346.9A patent/CN109284483B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160253313A1 (en) * | 2015-02-27 | 2016-09-01 | Nuance Communications, Inc. | Updating language databases using crowd-sourced input |
CN105260357A (en) * | 2015-10-14 | 2016-01-20 | 北京京东尚科信息技术有限公司 | Sensitive word checking method and device based on Hash sensitive words directed graph |
CN106445915A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | New word discovery method and device |
CN107657060A (en) * | 2017-10-20 | 2018-02-02 | 中电科新型智慧城市研究院有限公司 | A kind of characteristic optimization method based on semi-structured text classification |
CN108228851A (en) * | 2018-01-10 | 2018-06-29 | 北京奇艺世纪科技有限公司 | A kind of lists of keywords method of adjustment, device and electronic equipment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112114987A (en) * | 2019-06-20 | 2020-12-22 | 腾讯科技(深圳)有限公司 | Method and device for detecting abnormity of operating environment, intelligent terminal and storage medium |
CN112114987B (en) * | 2019-06-20 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Abnormality detection method and device for operation environment, intelligent terminal and storage medium |
WO2021032055A1 (en) * | 2019-08-19 | 2021-02-25 | 金色熊猫有限公司 | Automatic entry method and device for clinical trial reports, electronic equipment, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109284483B (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8370278B2 (en) | Ontological categorization of question concepts from document summaries | |
US20190121842A1 (en) | Content adjustment and display augmentation for communication | |
US10067983B2 (en) | Analyzing tickets using discourse cues in communication logs | |
CN109408829B (en) | Method, device, equipment and medium for determining readability of article | |
CN108416216A (en) | leak detection method, device and computing device | |
CN108776696B (en) | Node configuration method and device, storage medium and electronic equipment | |
CN111316232A (en) | Providing optimization using annotations of programs | |
CN114328208A (en) | Code detection method and device, electronic equipment and storage medium | |
US11093712B2 (en) | User interfaces for word processors | |
US20230078134A1 (en) | Classification of erroneous cell data | |
CN111339768B (en) | Sensitive text detection method, system, electronic equipment and medium | |
CN109284483A (en) | Text handling method, device, storage medium and electronic equipment | |
CN110647523B (en) | Data quality analysis method and device, storage medium and electronic equipment | |
CN115576828A (en) | Test case generation method, device, equipment and storage medium | |
Di Sorbo et al. | An nlp-based tool for software artifacts analysis | |
CN109710523B (en) | Visual draft test case generation method and device, storage medium and electronic equipment | |
Fischbach et al. | Cira: A tool for the automatic detection of causal relationships in requirements artifacts | |
CN113869789A (en) | Risk monitoring method and device, computer equipment and storage medium | |
CN115687651A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN114925757A (en) | Multi-source threat intelligence fusion method, device, equipment and storage medium | |
US20220277176A1 (en) | Log classification using machine learning | |
CN110427330B (en) | Code analysis method and related device | |
CN113900956A (en) | Test case generation method and device, computer equipment and storage medium | |
WO2018060777A1 (en) | Method and system for optimizing software testing | |
Komninos et al. | Mobile text entry behaviour in lab and in-the-wild studies: is it different? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |