CN111143882A - Information processing method and device - Google Patents

Information processing method and device Download PDF

Info

Publication number
CN111143882A
CN111143882A CN201911413820.2A CN201911413820A CN111143882A CN 111143882 A CN111143882 A CN 111143882A CN 201911413820 A CN201911413820 A CN 201911413820A CN 111143882 A CN111143882 A CN 111143882A
Authority
CN
China
Prior art keywords
information
preset
processed
address
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911413820.2A
Other languages
Chinese (zh)
Inventor
郑永升
石磊
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yitu Medical Technology Co ltd
Original Assignee
Hangzhou Yitu Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yitu Medical Technology Co ltd filed Critical Hangzhou Yitu Medical Technology Co ltd
Priority to CN201911413820.2A priority Critical patent/CN111143882A/en
Publication of CN111143882A publication Critical patent/CN111143882A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Abstract

The application discloses an information processing method and device, which are used for solving the problem that in the prior art, the information of an unstructured text cannot be desensitized, so that the labor cost and the time cost are increased. The method comprises the following steps: when receiving information to be processed, judging whether the information to be processed is structured text information or not; when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information; judging whether the preset keywords exist in the information to be processed or not; and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode. By adopting the scheme provided by the application, the automatic shielding of the unstructured text information is realized, so that the labor cost and the time cost are reduced.

Description

Information processing method and device
Technical Field
The present application relates to the field of computers, and in particular, to an information processing method and apparatus.
Background
At present, for the purpose of scientific research, a large amount of data, such as medical data, e.g., medical history texts of hospitals, is integrated, and it is necessary to integrate the medical data for scientific research, however, a large amount of sensitive information, such as user names, user addresses, mobile phone numbers, etc., exists in the medical data of hospitals, and the leakage of the information may bring unnecessary troubles to users, so that the medical data must be subjected to desensitization processing to meet the requirements of scientific research. Desensitization refers to deforming data of sensitive information in the data according to desensitization rules to shield the sensitive information. Under the condition of relating to client security data or some business sensitive data, the real data is modified and provided for test use under the condition of not violating system rules, and data desensitization is required to be carried out on personal information such as identification numbers, mobile phone numbers, card numbers, client numbers and the like.
Sensitive information is contained in structured text and sensitive information is also contained in unstructured text, however, in the prior art, only automatic desensitization can be performed on sensitive information in structured text. For the unstructured text, desensitization can be performed only through manual operation, and desensitization is performed on massive unstructured text manually, which also needs a lot of time, and increases labor cost and time cost.
Disclosure of Invention
An object of the embodiments of the present application is to provide an information processing method and apparatus, so as to solve the problem that in the prior art, desensitization cannot be performed on information of an unstructured text, which increases labor cost and time cost.
In order to solve the technical problem, the embodiment of the application adopts the following technical scheme: an information processing method comprising:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
The beneficial effect of this application lies in: when the information to be processed is unstructured text information, extracting preset keywords from the structured text information associated with the unstructured text; when the preset keywords in the information to be processed are determined based on the preset keywords, the specific contents of the preset keywords in the information to be processed are shielded in a preset mode, so that the automatic shielding of the unstructured text information is realized, and the labor cost and the time cost are reduced.
In one embodiment, the extracting the preset keyword from the structured text information includes:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, when the preset keyword is an address, a shielding operation is performed on specific content of the preset keyword in the information to be processed in a preset manner, where the shielding operation includes:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
The beneficial effect of this embodiment lies in: when the address information is shielded, only the detailed address in the home address is shielded, and the address related to the administrative district is reserved, so that the desensitized information is relatively comprehensive on the basis of realizing the protection of the user information.
In one embodiment, the method further comprises:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
The beneficial effect of this embodiment lies in: when the address information does not contain keywords related to administrative division information, the administrative division information in the address information can be converted according to an address dictionary, so that the converted address information contains the administrative division information; therefore, the situation that the address information cannot be effectively shielded because the address information does not contain the keywords related to the administrative district information is avoided.
In one embodiment, the address dictionary is constructed by:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, when the preset keyword is a birth date, the shielding operation is performed on the specific content of the preset keyword in the information to be processed in a preset manner, and the shielding operation includes:
hiding the other information except the year information in the birthday period according to a preset time format.
The beneficial effect of this embodiment lies in: according to a preset time format, other information except the year information in the birth date is hidden, so that the specific date of birth of the user is shielded, but the year of birth of the user is reserved, and therefore desensitized information is relatively comprehensive on the basis of protecting the user information.
In one embodiment, when the preset keyword is another keyword except for an address and a birth date, the shielding operation is performed on specific content of the preset keyword in the information to be processed in a preset manner, and the shielding operation includes:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The present application also provides an information processing apparatus including:
the first judgment module is used for judging whether the information to be processed is structured text information or not when the information to be processed is received;
the extraction module is used for extracting preset keywords from the structured text information when the information to be processed is unstructured text information, and the structured text information is associated with the unstructured text information;
the second judgment module is used for judging whether the preset keywords exist in the information to be processed or not;
and the shielding module is used for shielding the specific content of the preset keyword in the information to be processed in a preset mode when the preset keyword is determined to exist in the information to be processed.
In one embodiment, the extraction module includes:
the first obtaining submodule is used for obtaining a preset field from the structured text information;
and the extraction submodule is used for extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, the shielding module includes:
the second obtaining sub-module is used for obtaining address information in the information to be processed when the preset keyword is an address;
the judging submodule is used for judging whether the address information contains keywords related to administrative division information or not;
and the modifying submodule is used for modifying the address information according to a preset regular expression when the address information contains keywords related to the administrative division information, so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the apparatus further comprises:
the conversion module is used for converting the administrative division information in the address information according to an address dictionary when the address information does not contain keywords related to the administrative division information, so that the converted address information contains the administrative division information;
and the modification module is used for modifying the address information according to a preset regular expression so that only the address related to the administrative district is reserved in the address information.
In one embodiment, the address dictionary is constructed by:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the shielding module includes:
and the deleting submodule is used for hiding other information except the year information in the birth date according to a preset time format when the preset keyword is the birth date.
In one embodiment, the shielding module includes:
the replacing sub-module is used for replacing the preset keywords in the information to be processed through preset characters when the preset keywords are other keywords except addresses and birth dates; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The present application also provides an information processing apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor executes the executable instructions to implement the steps of:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
the extracting of the preset keywords from the structured text information includes:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keyword is an address, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
In one embodiment, the processor further executes the executable instructions to perform the steps of:
the address dictionary is constructed in the following way:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keyword is a birth date, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
hiding the other information except the year information in the birthday period according to a preset time format.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keywords are other keywords except addresses and birth dates, shielding specific contents of the preset keywords in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The present application also provides a non-transitory readable storage medium in which instructions, when executed by a processor within a device, enable the device to perform a method of information processing, the method comprising:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
In one embodiment, the instructions in the storage medium further comprise:
the extracting of the preset keywords from the structured text information includes:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keyword is an address, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the instructions in the storage medium further comprise:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
In one embodiment, the instructions in the storage medium further comprise:
the address dictionary is constructed in the following way:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keyword is a birth date, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
hiding the other information except the year information in the birthday period according to a preset time format.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keywords are other keywords except addresses and birth dates, shielding specific contents of the preset keywords in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
Drawings
Fig. 1 is a flowchart of an information processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of an information processing method according to another embodiment of the present application;
fig. 3 is a block diagram of an information processing apparatus according to an embodiment of the present application;
fig. 4 is a block diagram of an information processing apparatus according to another embodiment of the present application.
Detailed Description
Various aspects and features of the present application are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments of the present application. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the application.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and, together with a general description of the application given above and the detailed description of the embodiments given below, serve to explain the principles of the application.
These and other characteristics of the present application will become apparent from the following description of preferred forms of embodiment, given as non-limiting examples, with reference to the attached drawings.
It should also be understood that, although the present application has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of application, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present application are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application of unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the application.
Fig. 1 is a flowchart of an information processing method, which may be used in a computer, according to an embodiment of the present application, including the following steps S11-S14:
in step S11, when the information to be processed is received, it is determined whether the information to be processed is structured text information;
in step S12, when the information to be processed is unstructured text information, extracting preset keywords from structured text information, where the structured text information is associated with the unstructured text information;
in step S13, it is determined whether a preset keyword exists in the information to be processed;
in step S14, when it is determined that the preset keyword exists in the information to be processed, a masking operation is performed on the specific content of the preset keyword in the information to be processed in a preset manner.
In the embodiment, when the information to be processed is received, whether the information to be processed is structured text information is judged;
the structured text may also be referred to as structured data, which is data logically expressed by a data table structure, and is stored and managed mainly by a relational database, and is simply data of the data table structure. Since the structured text strictly conforms to the data format and length specification, the structured text can be easily recognized by electronic devices such as computers and mobile terminals.
The structured text is opposite to the unstructured text which is not represented by a data table structure, and the unstructured text has no corresponding relationship of corresponding field records due to complex recognition when desensitization is carried out. Therefore, in this embodiment, when the information to be processed is unstructured text information, first, a preset keyword is extracted from the structured text information, and the structured text information and the unstructured text information are associated with each other; then judging whether preset keywords exist in the information to be processed; and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
For example, when a piece of unstructured medical record text is desensitized, the following information exists in a certain medical record text: zhang San, born at 30 days 12 months in 1960, the telephone number is 1XXXXXXXXXX, X building X unit of XX district in Baoding City of Hebei province, the patient is admitted at 31 days 12 months in 2016 due to serious injury of cecum, small intestine adhesion loosening operation and right hemicolectomy are carried out on the next day, 20 days after operation, the recovery condition is good, no complication exists, and the patient is in charge of the discharge procedure on the same day.
When the case history text is received, the case history text is determined to be an unstructured text according to the judgment, so that a preset keyword needs to be extracted from the structured text associated with the unstructured text, and in a hospital system, when a patient is admitted, patient information and treatment types carried out by the patient information are input into the hospital system and stored in a structured text (data table) form, so that the structured text associated with the unstructured text can be exported through the hospital system. And then extracting records under corresponding fields according to the structured text, wherein the records under the name field necessarily comprise the name Zhang III of the patient, the records under the date field, the records under the telephone number field and the records under the address field of the patient, so that the medical record text can be searched and matched based on the records under the fields as keywords, whether the keywords exist in the medical record text is judged, and when the keywords exist in the medical record text, shielding operation is performed on specific contents in the keywords, for example, screening of the date of birth and specific date information in the date of birth, only year information is reserved, and for example, screening of detailed address information (namely, a small district X number building X unit) in the address information is performed, and only the province baoding city of Hezhou province is reserved.
After the step S14 is performed on the medical record information, the description in the obtained medical record text is as follows:
patients were born in 1960, the telephone numbers were as in Hebei province, and patients were admitted to the hospital in 2016 at 31 days 12 and then underwent loose intestinal adhesion surgery and right hemicolectomy for 20 days after surgery, with good recovery and no complications, and were discharged on the same day.
It can be seen that, in the information, the name of the patient and the telephone number are masked, the date of birth and the address information are partially masked, and the address information of the year of birth and the administrative division unit of the county level or more is retained, thereby being beneficial to researching the age distribution and the regional distribution of diseases in the scientific research process, and in addition, the admission date is not shielded, this is because the records under the admission date field are not selected as keywords in the structured text, when the record under the admission date field is taken as a key, a specific month and a specific date in a specific admission date may also be masked, that is, the specific month and the specific date in the description of "admission to date 31/12/2016" are masked and displayed as "admission to date/2016", and those skilled in the art can freely set the corresponding fields as necessary.
The beneficial effect of this application lies in: when the information to be processed is unstructured text information, extracting preset keywords from the structured text information associated with the unstructured text; when the preset keywords in the information to be processed are determined based on the preset keywords, the specific contents of the preset keywords in the information to be processed are shielded in a preset mode, so that desensitization of unstructured text information is realized, and the labor cost and the time cost are reduced.
In one embodiment, as shown in FIG. 2, the above step S12 can be implemented as the following steps S21-S22:
in step S21, acquiring a preset field from the structured text information;
in step S22, information corresponding to the preset field is extracted as a preset keyword.
In the embodiment, a preset field is obtained from the structured text information; for example, to mask name information in an unstructured text, a name field "name" may be extracted from a structured field, and then information corresponding to a preset field (i.e., a record under the "name" field) is determined by using information corresponding to the name field; and extracting the information corresponding to the preset field as a preset keyword. For example, if the information corresponding to the extracted name field is "zhang san", extracting "zhang san" as a preset keyword.
It should be noted that the structured text information associated with the unstructured text may be a data table having the same identification information as the unstructured text information. For example, the unstructured text has identification information such as document number and medical record number, and the associated structured text has the same identification information such as document number and medical record number, so that the corresponding structured text can be derived from the hospital system based on such identification information as an index.
In one embodiment, when the preset keyword is an address, the step S14 can be implemented as the following steps a 1-A3:
in step a1, address information in the information to be processed is acquired;
in step a2, it is determined whether or not a keyword related to administrative division information is included in the address information;
in step a3, when the address information includes a keyword related to the administrative division information, the address information is modified according to a preset regular expression so that only the address related to the administrative division is retained in the address information.
In the embodiment, address information in information to be processed is acquired; the address information may include information of province, city, and county, and may also include detailed address of the user, such as the cell, building number, unit information, and house number. Judging whether the address information contains keywords related to administrative division information or not; and when the address information contains keywords related to the administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information. That is, the user's detailed address is the information that needs to be masked. Information related to the administrative division, such as information of provinces, urban districts and counties where the user is located, should be reserved, which is beneficial to scientific research, for example, when the information to be processed is medical record text information, the information of the administrative division is reserved, which is beneficial to counting the regional distribution of the diseases recorded in the medical record text information, so as to analyze the causes of the diseases according to the regional distribution.
For example, the address information obtained from the information to be processed is: and at the moment, judging that the address information contains the key words province, city and county related to the administrative division information, and at the moment, modifying the address information according to a preset regular expression, namely deleting the information outside the administrative division. The regular expression is used to retain only addresses associated with administrative regions.
It should be noted that the provincial administrative districts are identified by the keywords of province, autonomous district, direct prefecture city and special administrative district; city-level administrative districts are identified by keywords of city, region, autonomous state, and union; the county administrative district is identified by keywords such as city (county city), prefecture, county, self-governing county, flag, special district, and forest district. In addition, a prefecture level city is usually placed behind a prefecture level city, and therefore, when two keywords "city" are detected in one piece of address information, the former "city" is regarded as a city level administrative district, and the latter "city" is regarded as a prefecture level administrative district.
The beneficial effect of this embodiment lies in: when the address information is shielded, only the detailed address in the home address is shielded, and the address related to the administrative district is reserved, so that the desensitized information is relatively comprehensive on the basis of realizing the protection of the user information.
In one embodiment, the method may also be implemented as steps B1-B2:
in step B1, when the address information does not include the keyword related to the administrative division information, the administrative division information in the address information is converted according to the address dictionary so that the administrative division information is included in the converted address information;
in step B2, the address information is modified according to a preset regular expression so that only addresses related to the administrative district are reserved for the address information.
In the present embodiment, when the address information does not include the keyword related to the administrative division information, for example, in one piece of medical record information, the included address information is hebei baoding Laishuan, and the keyword province, city and county do not exist in the address information, so that there is no way to identify a specific administrative division by the keyword province, city and county. Since the address dictionary includes specific place names of hebei province, baoding city, and lasaoyuan county, hebei can be converted into hebei province, baoding can be converted into baoding city, and source can be converted into lasaoyuan county based on the address dictionary. The converted address information is Hubei province Baoding City Laiyuan county, so that the address information can be modified according to a preset regular expression, and only addresses related to administrative districts are reserved in the address information.
The beneficial effect of this embodiment lies in: when the address information does not contain the key words related to the administrative division information, the administrative division information in the address information can be converted according to the address dictionary, so that the converted address information contains the administrative division information; therefore, the situation that the address information cannot be effectively shielded because the address information does not contain the keywords related to the administrative district information is avoided.
In one embodiment, the address dictionary is constructed by:
acquiring the affiliation between all administrative area information and administrative areas;
and constructing an address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
The embodiment introduces the process of establishing an address dictionary, and first, obtains all the administrative area information and the affiliations between administrative areas; wherein, administrative district information means: provincial administrative district information, city administrative district information and county administrative district information. And then constructing an address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
It should be noted that, at present, there are 34 provincial administrative districts, 333 prefectural administrative districts and 2846 prefectural administrative districts in China. These administrative areas may all be enumerated and placed in an address dictionary.
In one embodiment, when the preset keyword is the birth date, the step S14 can be implemented as the following steps:
hiding other information except the year information in the birthday period according to a preset time format.
In this embodiment, other information than the year information in the birthday date may be deleted. For example, the information to be processed includes the following information: the patient born in XX month XX of XX year, and XX month XX day in the birth date can be deleted through a preset time format, i.e. the information can be modified into 'the patient born in XX year'. Of course, the birth month and the specific birth date may be replaced with an "+" or other character, and may be obfuscated so that the birth month and the specific birth date are not visually recognizable.
The beneficial effect of this embodiment lies in: according to the preset time format, other information except the year information in the birth date is hidden, so that the specific date of the birth of the user is shielded, but the year of the birth of the user is reserved, and therefore the desensitized information is relatively comprehensive on the basis of protecting the user information.
In one embodiment, when the preset keyword is other than the address and the birth date, the step S14 can be implemented as the following steps:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
In this embodiment, when the preset keywords are names, numbers of private phones, mobile phone numbers, and addresses of mailboxes, these pieces of information do not help scientific research, and therefore, all the pieces of information may be shielded, for example, the following pieces of information exist in a certain medical history text: patient three, the telephone number is 1 XXXXXXXXXXX, the number of the fixed telephone is XXX-XXXXXXXXX, and the mailbox address is XXX @ XX.com, the preset keywords can be completely replaced by the preset characters to form the following information: patient, telephone number, fixed-line number, and mailbox address.
It is understood that the other keywords than the address and the date of birth include not only the name, the number of the landline, the number of the mobile phone, and the address of the mailbox, but are just examples of the embodiment. All information that needs to be fully shielded during desensitization as would occur to those skilled in the art is intended to fall within the scope of the present application.
Fig. 3 is a block diagram of an information processing apparatus that can be used in a computer according to an embodiment of the present application, and the apparatus includes the following modules:
the first judging module 31 is configured to, when receiving the information to be processed, judge whether the information to be processed is structured text information;
the extraction module 32 is configured to, when the information to be processed is unstructured text information, extract a preset keyword from the structured text information, where the structured text information is associated with the unstructured text information;
the second judging module 33 is configured to judge whether a preset keyword exists in the information to be processed;
the shielding module 34 is configured to, when it is determined that a preset keyword exists in the to-be-processed information, perform a shielding operation on a specific content of the preset keyword in the to-be-processed information in a preset manner.
In one embodiment, as shown in fig. 4, the extraction module 32 includes:
a first obtaining submodule 41, configured to obtain a preset field from the structured text information;
and the extraction submodule 42 is configured to extract information corresponding to the preset field as a preset keyword.
In one embodiment, the shielding module includes:
the second obtaining sub-module is used for obtaining address information in the information to be processed when the preset keyword is an address;
the judging submodule is used for judging whether the address information contains keywords related to administrative division information or not;
and the modifying submodule is used for modifying the address information according to a preset regular expression when the address information contains keywords related to the administrative division information, so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the apparatus further comprises:
the conversion module is used for converting the administrative division information in the address information according to an address dictionary when the address information does not contain keywords related to the administrative division information, so that the converted address information contains the administrative division information;
and the modification module is used for modifying the address information according to a preset regular expression so that only the address related to the administrative district is reserved in the address information.
In one embodiment, the address dictionary is constructed by:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the shielding module includes:
and the deleting submodule is used for hiding other information except the year information in the birth date according to a preset time format when the preset keyword is the birth date.
In one embodiment, the shielding module includes:
the replacing sub-module is used for replacing the preset keywords in the information to be processed through preset characters when the preset keywords are other keywords except addresses and birth dates; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The present application also provides an information processing apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor executes the executable instructions to implement the steps of:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
the extracting of the preset keywords from the structured text information includes:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keyword is an address, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
In one embodiment, the processor further executes the executable instructions to perform the steps of:
the address dictionary is constructed in the following way:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keyword is a birth date, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
hiding the other information except the year information in the birthday period according to a preset time format.
In one embodiment, the processor further executes the executable instructions to implement the steps of:
when the preset keywords are other keywords except addresses and birth dates, shielding specific contents of the preset keywords in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The present application also provides a non-transitory readable storage medium in which instructions, when executed by a processor within a device, enable the device to perform a method of information processing, the method comprising:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
In one embodiment, the instructions in the storage medium further comprise:
the extracting of the preset keywords from the structured text information includes:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keyword is an address, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
In one embodiment, the instructions in the storage medium further comprise:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
In one embodiment, the instructions in the storage medium further comprise:
the address dictionary is constructed in the following way:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keyword is a birth date, shielding the specific content of the preset keyword in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
hiding the other information except the year information in the birthday period according to a preset time format.
In one embodiment, the instructions in the storage medium further comprise:
when the preset keywords are other keywords except addresses and birth dates, shielding specific contents of the preset keywords in the information to be processed in a preset mode, wherein the shielding operation comprises the following steps:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
The above embodiments are only exemplary embodiments of the present application, and are not intended to limit the present application, and the protection scope of the present application is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present application and such modifications and equivalents should also be considered to be within the scope of the present application.

Claims (10)

1. An information processing method characterized by comprising:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
2. The method of claim 1, wherein the extracting the preset keyword from the structured text information comprises:
acquiring a preset field from the structured text information;
and extracting by taking the information corresponding to the preset field as a preset keyword.
3. The method of claim 2, wherein when the preset keyword is an address, the shielding operation is performed on specific content of the preset keyword in the information to be processed in a preset manner, and the shielding operation includes:
acquiring address information in the information to be processed;
judging whether the address information contains keywords related to administrative division information or not;
and when the address information contains keywords related to administrative division information, modifying the address information according to a preset regular expression so that only addresses related to the administrative division are reserved in the address information.
4. The method of claim 3, wherein the method further comprises:
when the address information does not contain keywords related to administrative division information, converting the administrative division information in the address information according to an address dictionary so that the converted address information contains the administrative division information;
and modifying the address information according to a preset regular expression so that only addresses related to administrative districts are reserved in the address information.
5. The method of claim 4, wherein the address dictionary is constructed by:
acquiring the affiliation between all administrative area information and the administrative areas;
and constructing the address dictionary according to the acquired membership between the administrative district information and the administrative district information, wherein the minimum administrative district unit in the address dictionary is a county-level administrative district, and the maximum administrative district unit in the address dictionary is a provincial-level administrative district.
6. The method of claim 2, wherein when the preset keyword is a birth date, the shielding operation of the specific content of the preset keyword in the information to be processed in a preset manner comprises:
hiding the other information except the year information in the birthday period according to a preset time format.
7. The method of claim 2, wherein when the preset keyword is a keyword other than an address and a birth date, the shielding operation of the specific content of the preset keyword in the information to be processed in a preset manner comprises:
replacing preset keywords in the information to be processed through preset characters; wherein the other keywords except the address and the birth date comprise at least one of the following keywords:
name, landline number, mobile phone number, and mailbox address.
8. An information processing apparatus characterized by comprising:
the first judgment module is used for judging whether the information to be processed is structured text information or not when the information to be processed is received;
the extraction module is used for extracting preset keywords from the structured text information when the information to be processed is unstructured text information, and the structured text information is associated with the unstructured text information;
the second judgment module is used for judging whether the preset keywords exist in the information to be processed or not;
and the shielding module is used for shielding the specific content of the preset keyword in the information to be processed in a preset mode when the preset keyword is determined to exist in the information to be processed.
9. An information processing apparatus characterized by comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor executes the executable instructions to implement the steps of:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
10. A non-transitory readable storage medium in which instructions are executed by a processor within a device to enable the device to perform a method of information processing, the method comprising:
when receiving information to be processed, judging whether the information to be processed is structured text information or not;
when the information to be processed is unstructured text information, extracting preset keywords from the structured text information, wherein the structured text information is associated with the unstructured text information;
judging whether the preset keywords exist in the information to be processed or not;
and when the preset keywords exist in the information to be processed, shielding the specific contents of the preset keywords in the information to be processed in a preset mode.
CN201911413820.2A 2019-12-31 2019-12-31 Information processing method and device Pending CN111143882A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911413820.2A CN111143882A (en) 2019-12-31 2019-12-31 Information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911413820.2A CN111143882A (en) 2019-12-31 2019-12-31 Information processing method and device

Publications (1)

Publication Number Publication Date
CN111143882A true CN111143882A (en) 2020-05-12

Family

ID=70522595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911413820.2A Pending CN111143882A (en) 2019-12-31 2019-12-31 Information processing method and device

Country Status (1)

Country Link
CN (1) CN111143882A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150773A1 (en) * 2010-12-14 2012-06-14 Dicorpo Phillip User interface and workflow for performing machine learning
CN106599322A (en) * 2017-01-03 2017-04-26 北京网智天元科技股份有限公司 Data desensitization method and device
CN107239507A (en) * 2017-05-14 2017-10-10 四川盛世天成信息技术有限公司 The Intellisense method and system of characteristic in a kind of data desensitization
CN107480549A (en) * 2017-06-28 2017-12-15 银江股份有限公司 A kind of shared sensitive information desensitization method of data-oriented and system
CN109522740A (en) * 2018-10-16 2019-03-26 易保互联医疗信息科技(北京)有限公司 Health data goes privacy processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150773A1 (en) * 2010-12-14 2012-06-14 Dicorpo Phillip User interface and workflow for performing machine learning
CN106599322A (en) * 2017-01-03 2017-04-26 北京网智天元科技股份有限公司 Data desensitization method and device
CN107239507A (en) * 2017-05-14 2017-10-10 四川盛世天成信息技术有限公司 The Intellisense method and system of characteristic in a kind of data desensitization
CN107480549A (en) * 2017-06-28 2017-12-15 银江股份有限公司 A kind of shared sensitive information desensitization method of data-oriented and system
CN109522740A (en) * 2018-10-16 2019-03-26 易保互联医疗信息科技(北京)有限公司 Health data goes privacy processing method and system

Similar Documents

Publication Publication Date Title
Scherr Genetic Privacy & the Fourth Amendment: Unregulated Surreptitious DNA Harvesting
CN109947903B (en) Idiom query method and device
Yang et al. Health literacy and its socio-demographic risk factors in Hebei: a cross-sectional survey
Meystre De-identification of unstructured clinical data for patient privacy protection
Williams The Specialist Chambers of Kosovo: the limits of internationalization?
CN112507176A (en) Automatic determination method and device for domain name infringement, electronic equipment and storage medium
CN112837772A (en) Pre-inquiry case history generation method and device
Lee et al. Re-identification of medical records by optimum quasi-identifiers
CN112102954A (en) Big data analysis cloud platform system capable of providing intelligent medical service
Holder et al. Transforming crime victims’ rights: from myth to reality
CN111091883A (en) Medical text processing method and device, storage medium and equipment
CN109299238B (en) Data query method and device
CN114328968A (en) Construction method and device of medical knowledge graph, electronic equipment and medium
KR20190011353A (en) System for Retrieving, Processing, Converting, and Saving Data for Use As Big Data
CN110752027A (en) Electronic medical record data pushing method and device, computer equipment and storage medium
CN111143882A (en) Information processing method and device
US20160292258A1 (en) Method and apparatus for filtering out low-frequency click, computer program, and computer readable medium
CN111104481B (en) Method, device and equipment for identifying matching field
CN110162712B (en) Event acquisition method, device, equipment and storage medium
Fournet ‘Face to face with horror’: The Tomašica mass grave and the trial of Ratko Mladić
KR101484766B1 (en) Apparatus and Method for Generating Electron Form in Medical Information System
CN112668895A (en) Digital resource quality supervision system
CN113393915A (en) Hospital is with patient information management system that sees a doctor
Frohman 23PolicemenAndMe: Analyzing the Constitutional Implications of Police Use of Commercial DNA Databases
Howard The EU Race Directive: Its Symbolic Value—Its Only Value?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512