CN110427773A - Information processing method, device, storage medium and terminal device - Google Patents

Information processing method, device, storage medium and terminal device Download PDF

Info

Publication number
CN110427773A
CN110427773A CN201910575551.3A CN201910575551A CN110427773A CN 110427773 A CN110427773 A CN 110427773A CN 201910575551 A CN201910575551 A CN 201910575551A CN 110427773 A CN110427773 A CN 110427773A
Authority
CN
China
Prior art keywords
character string
text
content
processed
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910575551.3A
Other languages
Chinese (zh)
Inventor
唐志辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910575551.3A priority Critical patent/CN110427773A/en
Priority to PCT/CN2019/103028 priority patent/WO2020258492A1/en
Publication of CN110427773A publication Critical patent/CN110427773A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to data processing fields, and in particular to a kind of information processing method, device, storage medium and terminal device, which comprises obtain include content of text file, the content of text of the file is formatted as character string dimension;The character string dimension is matched according to regular expressions, obtains the character string to be processed for meeting regular expression requirement in the character string dimension;According to the regular expression and the character string to be processed, the corresponding matched character string of the character string to be processed is determined;The corresponding content of text to be processed of character string to be processed described in the file is obtained, the content of text to be processed is replaced with into the corresponding matched text content of the matched character string.The present invention can content of text in batch-processed files, improve the treatment effeciency of file content.

Description

Information processing method, device, storage medium and terminal device
Technical field
The present invention relates to data processing fields, and in particular to a kind of information processing method, device, storage medium and terminal are set It is standby.
Background technique
With the development of internet information, the information such as company's information, contract information, personal information and customer information are all recorded In computer equipment, and in different time sections or towards different users, need to carry out the partial information placed on record special Different processing, such as when those information are towards masses, shielding processing is made to important, sensitive information.And currently, for such letter Breath usually directly makees encryption to the file for storing those information, or by manually modify one by one in file it is important, Sensitive information, directly file, which is encrypted, can not allow file pointedly towards different user, meanwhile, if towards difference User need to file carry out different disposal, by manually one by one modification file need to expend a large amount of resource, cause the time, The waste of manpower, resource.
Summary of the invention
To overcome the above technical problem, the especially prior art that can not carry out file content processing efficiently at low cost Problem, spy propose following technical scheme:
In a first aspect, the present invention provides a kind of information processing methods, comprising:
The file comprising content of text is obtained, the content of text of the file is formatted as character string dimension;
The character string dimension is screened according to regular expressions, obtain meet in the character string dimension it is described just The then character string to be processed that expression formula requires;
It is replaced, is generated described to be processed according to the character that the regular expression includes to the character string to be processed The matched character string is carried out inverse format processing, determines the matched character string pair by the corresponding matched character string of character string The matched text content answered;
The corresponding content of text to be processed of character string to be processed described in the file is obtained, it will be in the text to be processed Appearance replaces with the corresponding matched text content of the matched character string.
Further, described to obtain the corresponding content of text to be processed of character string to be processed described in the file, by institute It states after content of text to be processed replaces with the corresponding matched text content of the matched character string, further includes:
The text formatting of the matched text content is arranged the urtext format for obtaining the content of text to be processed For the urtext format.
Further, described that the character string dimension is screened according to regular expressions, obtain the character string number After the character string to be processed for meeting the regular expression requirement in group, further includes:
The association of the regular expression, character string to be processed and matched character string is saved to configuration file;
The modification to the regular expression of the configuration file is received, corresponding matched character string is modified, or
The modification to the matched character string of the configuration file is received, corresponding regular expression is modified.
Further, described to replace with the content of text to be processed in the corresponding matched text of the matched character string After appearance, further includes:
According to replaced matched text content, content of text alternate file is generated;
The accuracy for counting the content of text alternate file determines target regular expression according to the accuracy;
According to the target regular expression, the content of text alternate file is regenerated.
Further, described that the character string dimension is screened according to regular expressions, obtain the character string number Before the character string to be processed for meeting the regular expression requirement in group, comprising:
The attribute information for obtaining the file, according to preset rules match it is corresponding with the attribute information of the file just Then expression formula.
Further, described that the character string dimension is screened according to regular expressions, obtain the character string number Before the character string to be processed for meeting the regular expression requirement in group, comprising:
Whether the file format for judging the file is preset format, if it is not, the file format is converted to preset lattice Formula;
Obtain the corresponding regular expression of the preset format.
Second aspect, the present invention provide a kind of information processing unit, comprising:
Formatting module: for obtaining the file comprising content of text, the content of text of the file is formatted as word Symbol string array;
Matching module: for screening according to regular expressions to the character string dimension, the character string number is obtained Meet the character string to be processed of the regular expression requirement in group;
Determining module: the character for including to the character string to be processed according to the regular expression is replaced, The corresponding matched character string of the character string to be processed is generated, the matched character string is subjected to inverse format processing, determines institute State the corresponding matched text content of matched character string;
Replacement module: for obtaining the corresponding content of text to be processed of character string to be processed described in the file, by institute It states content of text to be processed and replaces with the corresponding matched text content of the matched character string.
Further, the replacement module further includes executing:
The text formatting of the matched text content is arranged the urtext format for obtaining the content of text to be processed For the urtext format.
The third aspect, the present invention also provides a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, which realizes above-mentioned information processing method when being executed by processor.
Fourth aspect, the present invention also provides a kind of terminal device, the terminal device include one or more processors, Memory, one or more computer programs, wherein one or more of computer programs are stored in the memory And be configured as being executed by one or more of processors, one or more of programs are configured to carry out above-mentioned information Processing method.
Compared with the prior art, the present invention has the following beneficial effects:
The present invention provides the methods that a kind of pair of fixed content of text of document is handled, after getting file, The content of text of file is handled, the content of text of file is formatted as character string one by one, thus by file Content of text regard the character string dimension being made of character string one by one as, then obtain processing this document content of text canonical Expression formula screens the character string to be processed for meeting regular expression requirement in the character string dimension, according to the regular expressions The character that formula includes to the character string to be processed is replaced, and generates the corresponding matched character string of the character string to be processed, The matched character string is subjected to inverse format processing, the corresponding matched text content of the matched character string is determined, in determination Character string to be processed is with each character string to be processed after corresponding matched character string, and the file is opened in simulation, and simulation is executed and replaced Operation is changed, the corresponding content of text to be processed of character string to be processed described in the file is obtained, it will be in the text to be processed Appearance replaces with the corresponding matched text content of the matched character string, completes the conversion to the content of text to be processed in file, It realizes to the processing of shielding important or sensitive information in file, modification or mask, when the file content processing of mass saves Between, improve the treatment effeciency to content of text in file.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is an embodiment flow diagram of information processing method of the present invention;
Fig. 2 is another embodiment flow diagram of information processing method of the present invention;
Fig. 3 is an embodiment schematic diagram of information processing unit of the present invention;
Fig. 4 is an example structure schematic diagram of terminal device of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refers to that there are the feature, integer, step, operations, but it is not excluded that in the presence of or addition it is one or more other Feature, integer, step, operation.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
It will be appreciated by those skilled in the art that of the invention so-called " application ", " application program ", " computer program " and The concept of similar statement, is same concept well known to those skilled in the art, refers to and is instructed by series of computation machine and dependency number According to the computer software for being suitable for electronics operation of the organic construction of resource.Unless specified, this name itself is not by programming language Say type, rank, the operating system of operation of also not rely by it or platform are limited.In the nature of things, this genus also not by Any type of terminal is limited.
The embodiment of the present invention provides a kind of information processing method, as shown in Figure 1, the described method comprises the following steps:
S10: the file comprising content of text is obtained, the content of text of the file is formatted as character string dimension.
In the present embodiment, when needing to handle the content for including in file, such as the company's letter stored in file Breath, personal information, customer information need to handle specific information, the processing is handled including mask, replacement is handled, Revise processing, obtain first include content of text file, Text Pretreatment then carried out to the file, the one of the present embodiment Kind embodiment, is formatted processing to the content of text, specifically, content analysis tools set is arrived the present embodiment In application program, by calling content analysis tools such as tika to handle the content of text of file, file is obtained The contents such as metadata, content return to formatted message, so that the content of text of file is formatted as one by one Assembling for those character strings is defined as character string dimension by character string, the present embodiment.
S20: according to regular expressions screening the character string dimension, obtains and meets institute in the character string dimension State the character string to be processed of regular expression requirement.
In the present embodiment, it is provided with a configuration file, is configured with regular expression in configuration file, a kind of embodiment, The configuration file is xml configuration file, regular expression is formulated in xml configuration file, regular expression is to character string A kind of logical formula of operation, including to general character (for example, letter between a to z) and spcial character (referred to as " first word Symbol ") operation, be combined into one " regular character string " using the group of predefined some characters and those characters, pass through " the regular character string " is filtered character string, specifically, utilizing regular expression according to the rule of xml configuration file The character string dimension is screened, the character string for meeting regular expression rule is found from the character string dimension, Those character strings are defined as character string to be processed, include different conditional plans in the regular expression, according to different Conditional plan is matched to different character strings, further, can also be more to meeting according to the combinations matches of multiple conditional plans The character string of a conditional plan, so that the fixed content of text of document is filtered out, for example, the condition that regular expression includes Rule sieves the character string in character string dimension according to the regular expression to filter out the character string comprising character " aa " Choosing, just can filter out the character string comprising character " aa ".
S30: being replaced according to the character that the regular expression includes to the character string to be processed, generate it is described to The matched character string is carried out inverse format processing, determines the matching character by the corresponding matched character string of processing character string Go here and there corresponding matched text content.
In order to handle content of text specific in file, including mask processing is carried out, at shielding to content of text Reason, replacement processing etc., in the present embodiment, after getting character string to be processed, according to the regular expression to described wait locate The character that reason character string includes is replaced, and the replacement includes being modified to the character of character string, such as character a is replaced At character b;The replacement further includes deleting the character of character string, i.e., character a is substituted for null character;The replacement is also Increase including the character to character string, i.e., character a is substituted for character ab;, it treats after processing character string is replaced, it is raw At the corresponding matched character string of the character string to be processed, i.e., character string to be processed is replaced according to the rule of regular expression It changes, the result of replacement is still character string, and replaced character string is defined as matched character string, the matching character by the present embodiment String is used in the subsequent replacement character string to be processed, for example, the corresponding matched character string of character string aaaa to be processed is The corresponding matched character string of xxxx, character string bbcc to be processed is xyxy.There are mapping relations, this realities with content of text for character string It applies example and the matched character string of generation is subjected to inverse format processing, so that it is determined that in the corresponding matched text of the matched character string Hold, and matched text content is saved into configuration file.
An embodiment of the present embodiment, the character string to be processed are the general format of file or the word of reference format Symbol string, can be converted into other character strings of arbitrary format.
S40: the corresponding content of text to be processed of character string to be processed described in the file is obtained, by the text to be processed This content replaces with the corresponding matched text content of the matched character string.
In the present embodiment, the character string to be processed in file and each character to be processed have been determined in xml configuration file It goes here and there after corresponding matched character string, the file is opened in simulation, specifically, obtaining the file format type of the file, then The mode that the file format type is opened in simulation is obtained according to the file format type, including .txt format text is opened in simulation Then part .doc formatted file or .pdf file simulate and execute content of text replacement operation, again specifically, by described to be processed Character string is converted into content of text to be processed, then searches from file, positions the content of text to be processed, then obtains described Corresponding matched text content after matched character string conversion, then replaces with the content to be processed in the matched text Hold, complete the conversion to the content of text to be processed in file, realizes to shielding important or sensitive information in file, repairs Change or mask is handled.
The method that a kind of pair of fixed content of text of document is handled is present embodiments provided, file is being got Afterwards, the content of text of file is handled, the content of text of file is formatted as character string one by one, thus by literary The content of text of part regards the character string dimension being made of character string one by one as, is then obtaining processing this document content of text just Then expression formula screens the character string to be processed for meeting regular expression requirement in the character string dimension, according to the canonical table The character for including to the character string to be processed up to formula is replaced, and generates the corresponding matching character of the character string to be processed The matched character string is carried out inverse format processing, the corresponding matched text content of the matched character string is determined, true by string Determine character string to be processed and after corresponding matched character string, obtains word to be processed described in the file with each character string to be processed The content of text to be processed is replaced with the corresponding matching text of the matched character string by the corresponding content of text to be processed of symbol string This content completes the conversion to the content of text to be processed in file, realizes to shielding important or sensitive information in file, repairs Change or mask processing, the file content processing of mass save the time, improve the treatment effeciency to content of text in file.
A kind of embodiment of the invention, as shown in Fig. 2, character string to be processed described in the acquisition file is corresponding Content of text to be processed, by the content of text to be processed replace with the corresponding matched text content of the matched character string it Afterwards, further includes:
S41: the urtext format of the content of text to be processed is obtained, by the text formatting of the matched text content It is set as the urtext format.
In practical applications, since different files have in different text formatting or even same file in different literals Holding also has different text formattings, including runic format, different fonts format, different colours format etc., in the present embodiment, In After the content of text to be processed being replaced with to the corresponding matched text content of the matched character string, obtain described wait locate The urtext format for managing content of text, then will replace the text lattice of the matched text content of the content of text to be processed Formula is set as the urtext format, the text of the text formatting Yu content of text to be processed of the matched text content after setting This format is identical, is the urtext format.For example, original text formatting is the content of text to be processed of runic format, The text formatting of content of text is still runic format after replacement, and original text formatting is the content of text to be processed of No. 5 Song typefaces, is replaced The text formatting for changing rear content of text is still No. 5 Song typefaces, in text under the text formatting for not modifying original content of text Appearance is replaced or modifies, to reach the advantages of not influencing file entirety text formatting.
A kind of embodiment of the invention, it is described that the character string dimension is screened according to regular expressions, obtain institute After stating the character string to be processed for meeting the regular expression requirement in character string dimension, further includes:
The association of the regular expression, character string to be processed and matched character string is saved to configuration file;
The modification to the regular expression of the configuration file is received, corresponding matched character string is modified, or
The modification to the matched character string of the configuration file is received, corresponding regular expression is modified.
In practical applications, possible different departments require the content of file different, in same text in file It is also not identical to hold desired processing result, for example, needing department S1 by the text of the related to characters name in file A This content replaces with the form of " * * * ", for department S2, needs the content of text of the related to characters name in file A It replaces with the form of " surname+* * ", in the present embodiment, the regular expression, character string to be processed and matched character string is closed UNPROFOR is deposited to configuration file, such as is saved to xml configuration file, then when different departments are for same text content in file When replacement demand difference, it is only necessary to the xml configuration file is modified, to modify the regular expression or matching word in configuration file Symbol string, can replace different content of text according to different needs.
An embodiment of the present embodiment receives the modification to the regular expression of the configuration file, to canonical table Replacement condition up to formula is modified, and just can be given birth to after modification according to character string to be processed and the replacement condition of the regular expression At different matched character strings, to modify the corresponding matched character string of character string to be processed, needing identical text to be processed It, can be by the regular expression in modification configuration file, to modify when this content replaces with different matched text contents Corresponding matched character string.The another embodiment of the present embodiment receives repairing to the matched character string of the configuration file Change, modify corresponding regular expression, when business personnel does not have the ability of modification regular expression, in the configuration file After character string to be processed has been determined, then by modifying the matched character string, to modify corresponding regular expression, guarantee It is subsequent that character string to be processed can be replaced with to required matched character string, configuration file described in real time modifying is realized, to mention Height meets the needs of different business is to file content processing to the treatment effeciency of content of text in file.
A kind of embodiment of the invention, described that the content of text to be processed replaced with the matched character string is corresponding After matched text content, further includes:
According to replaced matched text content, content of text alternate file is generated;
The accuracy for counting the content of text alternate file determines target regular expression according to the accuracy;
According to the target regular expression, the content of text alternate file is regenerated.
In the present embodiment, after the content of text to be processed in file is replaced with matched text content, after replacement Matched text content, generate content of text alternate file, due to the incipient stage of machine processing, it is understood that there may be content of text replaces The higher phenomenon of error rate is changed, in order to further increase the accuracy of machine processing, after single treatment, is counted in the text The accuracy for holding the contents processing of alternate file, when the accuracy is lower than preset value, this time according to accuracy adjustment Used regular expression when replacement, so that it is determined that target regular expression, then target regular expression described in pattern, right The file executes primary information processing operation again, regenerates content of text alternate file.For example, in test phase, setting Multiple and different regular expressions are applied to the processing to the same text content of same file, then count the different canonical tables It is ranked up up to the accuracy of the processing result of formula, and then according to accuracy, it is in practical applications, preferentially high using accuracy Regular expression handles file, if in practical application, when the accuracy of the regular expression is unsatisfactory for requiring, then replacing For other regular expressions, the accuracy that file content is handled with raising.
A kind of embodiment of the invention, it is described that the character string dimension is screened according to regular expressions, obtain institute Before stating the character string to be processed for meeting the regular expression requirement in character string dimension, comprising:
The attribute information for obtaining the file, according to preset rules match it is corresponding with the attribute information of the file just Then expression formula.
In practical applications, different files are with the different phase of business development, and content to be processed needed for file is not to the utmost It is identical, in the present embodiment, different processing is carried out to the file of different phase by different regular expression realizations, specifically , the character string dimension is being screened according to regular expressions, is obtaining and meets the canonical in the character string dimension Expression formula require character string to be processed before, obtain the attribute information of the file, then according to preset rules match with The corresponding regular expression of the attribute information of the file, so that it is determined that the canonical table that the file of different attribute information is required at this time Up to formula, in one embodiment, the attribute information includes the creation time of file, then judges current time and the text Duration interval between part creation time obtains the corresponding regular expression in duration interval, then further according to different canonical Expression formula performs corresponding processing the file of different creation times.For example, if between the creation time and current time of file It is greater than 6 months, matched regular expression A every duration, the content of text X and Y in file is made by mask based on regular expression A Processing, and when the time interval was less than 6 months, matched regular expression B, based on regular expression B by the text in file This content Y makees mask processing, does not make mask processing for content of text X.A kind of application scenarios of the present embodiment, it is special for applying The material of telling somebody what one's real intentions are of benefit, judges the creation time of the material of telling somebody what one's real intentions are whether less than 12 months, if then shield technology information and invention People's information;After if this is told somebody what one's real intentions are, the creation time of material is greater than 12 months, the patent document of the material of telling somebody what one's real intentions are generally is disclosed, and is not required to The technical information of the material of telling somebody what one's real intentions are is shielded, inventor's information is only shielded.
A kind of embodiment of the invention, it is described that the character string dimension is screened according to regular expressions, obtain institute Before stating the character string to be processed for meeting the regular expression requirement in character string dimension, comprising:
Whether the file format for judging the file is preset format, if it is not, the file format is converted to preset lattice Formula;
Obtain the corresponding regular expression of the preset format.
In practice, file to be processed includes a variety of file formats, such as .doc format .txt format .pdf format, is It reduces the processing logical code different to the file development of different-format and brings two extra workloads, in the present embodiment, When determining regular expression, judge whether the file format of the file is preset format, if it is not, the file format is turned It is changed to preset format, the file of preset format then has corresponding regular expression to be matched, and then obtains the preset format Corresponding regular expression is handled the file of different file formats to realize, improves the text to different file formats The treatment effeciency of part.
As shown in figure 3, in another embodiment, the present invention provides a kind of information processing units, comprising:
Formatting module 10: for obtaining the file comprising content of text, the content of text of the file is formatted as Character string dimension;
Matching module 20: for screening according to regular expressions to the character string dimension, the character string is obtained Meet the character string to be processed of the regular expression requirement in array;
Determining module 30: the character for including to the character string to be processed according to the regular expression replaces It changes, generates the corresponding matched character string of the character string to be processed, the matched character string is subjected to inverse format processing, is determined The corresponding matched text content of the matched character string;
Replacement module 40:, will for obtaining the corresponding content of text to be processed of character string to be processed described in the file The content of text to be processed replaces with the corresponding matched text content of the matched character string.
A kind of embodiment of the invention, the replacement module 40 further include executing;
The text formatting of the matched text content is arranged the urtext format for obtaining the content of text to be processed For the urtext format.
A kind of embodiment of the invention, described device further include:
Configuration module: for saving the association of the regular expression, character string to be processed and matched character string to configuration File;The modification to the regular expression of the configuration file is received, modifies corresponding matched character string, or receive and match to described Corresponding regular expression is modified in the modification for setting the matched character string of file.
A kind of embodiment of the invention, described device further include:
Adjust module: for generating content of text alternate file according to replaced matched text content;Count the text The accuracy of this content alternate file determines target regular expression according to the accuracy;According to the target regular expressions Formula regenerates the content of text alternate file.
A kind of embodiment of the invention, the matching module 20 further include executing:
The attribute information for obtaining the file, according to preset rules match it is corresponding with the attribute information of the file just Then expression formula.
A kind of embodiment of the invention, the matching module 20 further include executing:
Whether the file format for judging the file is preset format, if it is not, the file format is converted to preset lattice Formula;Obtain the corresponding regular expression of the preset format.
In another embodiment, the present invention provides a kind of computer readable storage medium, computer-readable storage mediums Computer program is stored in matter, which realizes information processing side described in above-described embodiment when being executed by processor Method.Wherein, the computer readable storage medium includes but is not limited to any kind of disk (including floppy disk, hard disk, CD, CD- ROM and magneto-optic disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, immediately Memory), EPROM (EraSable Programmable Read-Only Memory, Erarable Programmable Read only Memory), (Electrically EraSable Programmable Read-Only Memory, electric erazable programmable is read-only to be deposited EEPROM Reservoir), flash memory, magnetic card or light card.It is, storage equipment includes by equipment (for example, computer, mobile phone) with energy Any medium for the form storage or transmission information enough read can be read-only memory, disk or CD etc..
A kind of computer readable storage medium provided in an embodiment of the present invention is, it can be achieved that obtain the text comprising content of text The content of text of the file is formatted as character string dimension by part;The character string dimension is carried out according to regular expressions Screening, obtains the character string to be processed for meeting the regular expression requirement in the character string dimension;According to the canonical table The character for including to the character string to be processed up to formula is replaced, and generates the corresponding matching character of the character string to be processed The matched character string is carried out inverse format processing, determines the corresponding matched text content of the matched character string by string;It obtains The corresponding content of text to be processed of character string to be processed described in the file replaces with the content of text to be processed described The corresponding matched text content of matched character string.The side handled by providing a kind of pair of fixed content of text of document Method is handled the content of text of file after getting file, and the content of text of file is formatted as one by one Character string, so that the content of text of file to be regarded as to the character string dimension being made of character string one by one, then obtaining processing should The regular expression of body of an instrument content screens the character to be processed for meeting regular expression requirement in the character string dimension String, is replaced according to the character that the regular expression includes to the character string to be processed, generates the character to be processed It goes here and there corresponding matched character string, the matched character string is subjected to inverse format processing, determines that the matched character string is corresponding Matched text content is determining character string to be processed with each character string to be processed after corresponding matched character string, described in acquisition The corresponding content of text to be processed of character string to be processed described in file, replaces with the matching for the content of text to be processed The corresponding matched text content of character string completes the conversion to the content of text to be processed in file, realizes to important in file Or shielding, modification or the mask processing of sensitive information, the file content processing of mass save the time, improve to file Chinese The treatment effeciency of this content.
The embodiment of above- mentioned information processing method may be implemented in computer readable storage medium provided in an embodiment of the present invention, Concrete function realizes the explanation referred in embodiment of the method, and details are not described herein.
In addition, the present invention also provides a kind of terminal devices, as shown in figure 4, the terminal device in another embodiment Including devices such as processor 403, memory 405, input unit 407 and display units 409.Those skilled in the art can manage Solution, the structure devices shown in Fig. 4 do not constitute the restriction to all terminal devices, may include than illustrating more or fewer portions Part, or the certain components of combination.The memory 405 can be used for storing computer program 401 and each functional module, the place Reason device 403 runs the computer program 401 for being stored in memory 405, thereby executing the various function application and data of equipment Processing.The memory 405 can be built-in storage or external memory, or including both built-in storage and external memory.It is interior Memory may include that read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically-erasable can be compiled Journey ROM (EEPROM), flash memory or random access memory.External memory may include hard disk, floppy disk, ZIP disk, USB flash disk, Tape etc..Memory disclosed in this invention includes but is not limited to the memory of these types.Memory disclosed in this invention 405 are only used as example rather than as restriction.
Input unit 407 is used to receive the input of signal and receives the input of user, and input unit 407 may include touch surface Plate and other input equipments, touch panel collect user on it or nearby touch operation (such as user using finger, The operation of any suitable object or attachment such as stylus on touch panel or near touch panel), and according to presetting The corresponding attachment device of driven by program;Other input equipments can include but is not limited to physical keyboard, function key (for example plays Control button, switch key etc.), trace ball, mouse, one of operating stick etc. or a variety of.Display unit 409 can be used for showing The information of user's input is supplied to the information of user and the various menus of computer equipment.Liquid can be used in display unit 409 The forms such as crystal display, Organic Light Emitting Diode.Processor 403 is the control centre of computer equipment, using various interfaces and The various pieces of the entire computer of connection, by running or executing the software program being stored in memory 403 and/or mould Block, and the data being stored in memory are called, perform various functions and handle data.
In one embodiment, the terminal device includes that one or more processors 403 and one or more are deposited Reservoir 405, one or more computer programs 401, wherein one or more of computer programs 401 are stored in memory It in 405 and is configured as being executed by one or more of processors 403, one or more of computer programs 401 configure For executing body of an instrument content processing method described in above embodiments.403 energy of one or more processors shown in Fig. 4 It is enough to execute, realize formatting module 10, matching module 20, determining module 30, the function of replacement module 40 shown in Fig. 3.
A kind of terminal device provided in an embodiment of the present invention is, it can be achieved that obtain the file comprising content of text, by the text The content of text of part is formatted as character string dimension;The character string dimension is screened according to regular expressions, obtains institute State the character string to be processed for meeting the regular expression requirement in character string dimension;According to the regular expression to it is described to The character that processing character string includes is replaced, and the corresponding matched character string of the character string to be processed is generated, by the matching Character string carries out inverse format processing, determines the corresponding matched text content of the matched character string;Obtain institute in the file The corresponding content of text to be processed of character string to be processed is stated, the content of text to be processed is replaced with into the matched character string pair The matched text content answered.The method handled by providing a kind of pair of fixed content of text of document, is getting text After part, the content of text of file is handled, the content of text of file is formatted as character string one by one, thus will The content of text of file regards the character string dimension being made of character string one by one as, then obtains processing this document content of text Regular expression screens the character string to be processed for meeting regular expression requirement in the character string dimension, according to the canonical The character that expression formula includes to the character string to be processed is replaced, and generates the corresponding matching character of the character string to be processed The matched character string is carried out inverse format processing, the corresponding matched text content of the matched character string is determined, true by string Determine character string to be processed and after corresponding matched character string, obtains word to be processed described in the file with each character string to be processed The content of text to be processed is replaced with the corresponding matching text of the matched character string by the corresponding content of text to be processed of symbol string This content completes the conversion to the content of text to be processed in file, realizes to shielding important or sensitive information in file, repairs Change or mask processing, the file content processing of mass save the time, improve the treatment effeciency to content of text in file.
The embodiment of the information processing method of above-mentioned offer may be implemented in terminal device provided in an embodiment of the present invention, specifically Function realizes the explanation referred in embodiment of the method, and details are not described herein.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of information processing method characterized by comprising
The file comprising content of text is obtained, the content of text of the file is formatted as character string dimension;
The character string dimension is screened according to regular expressions, obtains and meets the canonical table in the character string dimension The character string to be processed required up to formula;
It is replaced according to the character that the regular expression includes to the character string to be processed, generates the character to be processed It goes here and there corresponding matched character string, the matched character string is subjected to inverse format processing, determines that the matched character string is corresponding Matched text content;
The corresponding content of text to be processed of character string to be processed described in the file is obtained, the content of text to be processed is replaced It is changed to the corresponding matched text content of the matched character string.
2. the method according to claim 1, wherein described obtain character string pair to be processed described in the file The content of text to be processed is replaced with the corresponding matched text content of the matched character string by the content of text to be processed answered Later, the method also includes:
The text formatting of the matched text content is set institute by the urtext format for obtaining the content of text to be processed State urtext format.
3. the method according to claim 1, wherein it is described according to regular expressions to the character string dimension into Row screening, after obtaining the character string to be processed for meeting the regular expression requirement in the character string dimension, further includes:
The association of the regular expression, character string to be processed and matched character string is saved to configuration file;
The modification to the regular expression of the configuration file is received, corresponding matched character string is modified, or,
The modification to the matched character string of the configuration file is received, corresponding regular expression is modified.
4. the method according to claim 1, wherein described replace with described for the content of text to be processed After the corresponding matched text content of character string, further includes:
According to replaced matched text content, content of text alternate file is generated;
The accuracy for counting the content of text alternate file determines target regular expression according to the accuracy;
According to the target regular expression, the content of text alternate file is regenerated.
5. the method according to claim 1, wherein it is described according to regular expressions to the character string dimension into Row screening, before obtaining the character string to be processed for meeting the regular expression requirement in the character string dimension, comprising:
The attribute information for obtaining the file matches canonical table corresponding with the attribute information of the file according to preset rules Up to formula.
6. the method according to claim 1, wherein it is described according to regular expressions to the character string dimension into Row screening, before obtaining the character string to be processed for meeting the regular expression requirement in the character string dimension, comprising:
Whether the file format for judging the file is preset format, if it is not, the file format is converted to preset format;
Obtain the corresponding regular expression of the preset format.
7. a kind of information processing unit characterized by comprising
Formatting module: for obtaining the file comprising content of text, the content of text of the file is formatted as character string Array;
Matching module: it for being screened according to regular expressions to the character string dimension, obtains in the character string dimension Meet the character string to be processed of the regular expression requirement;
Determining module: the character for including to the character string to be processed according to the regular expression is replaced, and is generated The matched character string is carried out inverse format processing by the corresponding matched character string of the character string to be processed, determines described With the corresponding matched text content of character string;
Replacement module: for obtaining the corresponding content of text to be processed of character string to be processed described in the file, will it is described to Processing content of text replaces with the corresponding matched text content of the matched character string.
8. device according to claim 7, which is characterized in that the replacement module further includes executing:
The text formatting of the matched text content is set institute by the urtext format for obtaining the content of text to be processed State urtext format.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize information processing method as claimed in any one of claims 1 to 6 when being executed by processor.
10. a kind of terminal device characterized by comprising
One or more processors;
Memory;
One or more computer programs, wherein one or more of computer programs are stored in the memory and quilt It is configured to be executed by one or more of processors, one or more of computer programs are configured to carry out according to right It is required that 1 to 6 described in any item information processing methods.
CN201910575551.3A 2019-06-28 2019-06-28 Information processing method, device, storage medium and terminal device Pending CN110427773A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910575551.3A CN110427773A (en) 2019-06-28 2019-06-28 Information processing method, device, storage medium and terminal device
PCT/CN2019/103028 WO2020258492A1 (en) 2019-06-28 2019-08-28 Information processing method and apparatus, storage medium and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910575551.3A CN110427773A (en) 2019-06-28 2019-06-28 Information processing method, device, storage medium and terminal device

Publications (1)

Publication Number Publication Date
CN110427773A true CN110427773A (en) 2019-11-08

Family

ID=68409929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910575551.3A Pending CN110427773A (en) 2019-06-28 2019-06-28 Information processing method, device, storage medium and terminal device

Country Status (2)

Country Link
CN (1) CN110427773A (en)
WO (1) WO2020258492A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941946A (en) * 2019-11-29 2020-03-31 西安四叶草信息技术有限公司 Information extraction method, device, equipment and storage medium
CN111090671A (en) * 2019-12-19 2020-05-01 山大地纬软件股份有限公司 Method and device for eliminating difference between hollow character string and invalid character string in database
CN113378518A (en) * 2021-05-17 2021-09-10 广东广宇科技发展有限公司 Regular expression-based JSON data format replacement method, system and storage medium
CN114398578A (en) * 2021-12-23 2022-04-26 网易有道信息技术(北京)有限公司 Method for preprocessing HTML character string and related product
CN114697311A (en) * 2020-12-31 2022-07-01 中国移动通信有限公司研究院 File processing method, device, equipment and storage medium
CN113378518B (en) * 2021-05-17 2024-06-11 广东广宇科技发展有限公司 Regular expression-based JSON data format replacement method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012073794A (en) * 2010-09-28 2012-04-12 Fujitsu Ltd Character string selection method, character string selection program, and character string selection device
CN103455307A (en) * 2012-05-29 2013-12-18 腾讯科技(深圳)有限公司 Method and device for processing information output by command line
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium
CN109829328A (en) * 2018-12-19 2019-05-31 上海晶赞融宣科技有限公司 Data desensitization, inverse desensitization method and device, storage medium, terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060265357A1 (en) * 2005-04-26 2006-11-23 Potts Matthew P Method of efficiently parsing a file for a plurality of strings
CN105701074A (en) * 2016-01-04 2016-06-22 北京京东尚科信息技术有限公司 Character processing method and apparatus
CN107329957B (en) * 2017-05-18 2020-08-18 网易(杭州)网络有限公司 Method for replacing code Chinese character string and computer readable storage medium
CN109376547A (en) * 2018-09-29 2019-02-22 北京邮电大学 Information protection method and system based on file path

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012073794A (en) * 2010-09-28 2012-04-12 Fujitsu Ltd Character string selection method, character string selection program, and character string selection device
CN103455307A (en) * 2012-05-29 2013-12-18 腾讯科技(深圳)有限公司 Method and device for processing information output by command line
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium
CN109829328A (en) * 2018-12-19 2019-05-31 上海晶赞融宣科技有限公司 Data desensitization, inverse desensitization method and device, storage medium, terminal

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941946A (en) * 2019-11-29 2020-03-31 西安四叶草信息技术有限公司 Information extraction method, device, equipment and storage medium
CN111090671A (en) * 2019-12-19 2020-05-01 山大地纬软件股份有限公司 Method and device for eliminating difference between hollow character string and invalid character string in database
CN114697311A (en) * 2020-12-31 2022-07-01 中国移动通信有限公司研究院 File processing method, device, equipment and storage medium
CN113378518A (en) * 2021-05-17 2021-09-10 广东广宇科技发展有限公司 Regular expression-based JSON data format replacement method, system and storage medium
CN113378518B (en) * 2021-05-17 2024-06-11 广东广宇科技发展有限公司 Regular expression-based JSON data format replacement method, system and storage medium
CN114398578A (en) * 2021-12-23 2022-04-26 网易有道信息技术(北京)有限公司 Method for preprocessing HTML character string and related product

Also Published As

Publication number Publication date
WO2020258492A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN110427773A (en) Information processing method, device, storage medium and terminal device
Fox et al. An R companion to applied regression
Marciniak Encyclopedia of software engineering
Halbleib et al. ITS version 3.0: the integrated TIGER series of coupled electron/photon Monte Carlo transport codes
US20170192758A1 (en) Method and apparatus for migration of application source code
Van Atteveldt et al. Computational analysis of communication
US20160041824A1 (en) Refining data understanding through impact analysis
CN101853163B (en) Industry application software system construction method based on assembly business modeling
Bakos KNIME essentials
CN101876969B (en) Report form developing method and device
Bhargava et al. On embedded languages for model management
Wu Finding achievable features and constraint conflicts for inconsistent metamodels
Wojszczyk et al. The process of verifying the implementation of design patterns—used data models
Hay-Jahans An R companion to linear statistical models
US10025838B2 (en) Extract transform load input suggestion
Panahandeh et al. MUPPIT: A method for using proper patterns in model transformations
Amendola et al. Testing in ASP: Revisited language and programming environment
CN110737642B (en) Database information analysis method, database information analysis device, computer device and storage medium
Bartička et al. Evaluating attribution methods for explainable nlp with transformers
de Boer et al. Completeness and complexity of reasoning about call-by-value in Hoare logic
Bourke Computer Science I
JP6870454B2 (en) Analytical equipment, analytical programs and analytical methods
Windham Introduction to Regular Expressions in SAS
US11593511B2 (en) Dynamically identifying and redacting data from diagnostic operations via runtime monitoring of data sources
Puflović et al. CSPlag: a source code plagiarism detection using syntax trees and intermediate language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination