WO2021245833A1 - 文書の黒塗り箇所表示システム、方法、プログラム - Google Patents

文書の黒塗り箇所表示システム、方法、プログラム Download PDF

Info

Publication number
WO2021245833A1
WO2021245833A1 PCT/JP2020/021904 JP2020021904W WO2021245833A1 WO 2021245833 A1 WO2021245833 A1 WO 2021245833A1 JP 2020021904 W JP2020021904 W JP 2020021904W WO 2021245833 A1 WO2021245833 A1 WO 2021245833A1
Authority
WO
WIPO (PCT)
Prior art keywords
black
painted
document
unit
trained model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/021904
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
貴司 及川
崇則 小林
晃久 津田
久史 永井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2022529216A priority Critical patent/JP7513089B2/ja
Priority to PCT/JP2020/021904 priority patent/WO2021245833A1/ja
Priority to US18/007,761 priority patent/US20230334164A1/en
Publication of WO2021245833A1 publication Critical patent/WO2021245833A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning

Definitions

  • the present invention relates to a black-painted part display system, a method, and a program of a document.
  • Patent Document 1 discloses an information sharing system having a security module unit 113 that specifies feature information indicating that access right data is accessible as feature information.
  • the feature information access right table 116 shown in FIG. 5A is an example in which the feature information access right is set in association with the document security attribute information.
  • the feature information access right table 116 shown in FIG. 5B is an example in which the feature information access right is set in association with each document.
  • the document security attribute 401 indicates the security attribute information given to the document data
  • the feature information 402 indicates the access right of the feature information extracted from the document data for each feature type 403.
  • the feature type 403 is an index for classifying the feature information 402, and the neighborhood display 404 indicates whether or not the neighborhood data can be displayed.
  • Patent Document 2 discloses an information disclosure program that enables work without knowing information that should be hidden and information that should not be hidden when creating a disclosure document.
  • paragraph 0011 states, "...
  • the auxiliary storage device 8 of the server 1 further stores a master document 12, a disclosed document 13, a non-disclosure dictionary 14, a compulsory disclosure dictionary 15, and a comment dictionary 16.
  • paragraph 0012 "... the master document 12 is created.
  • the master document 12 is given a non-disclosure tag and a non-disclosure reason for the character string to be concealed.”
  • paragraph 0013 "The examiner is forced to execute the compulsory disclosure program 10 for the master document 12 created by the document creator, among the character strings that are not disclosed in the master document 12.
  • the disclosure document 13 is created.
  • Patent Document 3 discloses that a public document creation support device capable of reducing the load of specifying a specific portion of occultation and supporting the creation of a public document that reliably reflects the settlement result is provided.
  • paragraph 0018 states, "...
  • This hard disk 14 holds a character string search condition table T1 as shown in FIG. 3 and an occultation candidate image area data table T2 as shown in FIG.
  • paragraph 0019 "The character string search condition table T1 lists, as shown in FIG. 3, a character string to be searched in advance for each non-disclosure information category.”
  • "personal information” and “national security information” are shown as categories of non-disclosure information, and ... ", and in paragraph 0024,” ...
  • the category to which the character string belongs is set to be highlighted in a specified manner (S7) ... ", and in paragraph 0046,"
  • the public document creation support device 1 receives this post-settlement document data. Then, the data is output to the printer 3 to print the public document data. At this time, the portion to which the occultation annotation is attached is filled in black.
  • Patent Document 4 for each of a plurality of simple sentences included in the hypothetical sentence, a simple sentence having a similar meaning to the simple sentence is extracted from the target sentences including the plurality of simple sentences, and each of the hypothetical sentence and the target sentence is described.
  • Discourse-related information indicating the discourse relationship which is the order of occurrence of events between the single sentences, is generated based on the appearance order of the simple sentences before and after the connecting word, and the discourse between the single sentences included in the hypothetical sentence is generated based on the discourse-related information.
  • the discourse relation distance which is the number of intersections between the relations and the positions between the simple sentences extracted by the extraction unit, is calculated, and the target sentence implies a hypothetical sentence based on the value including the discourse relation distance and a predetermined threshold.
  • An implication determination method including determining whether or not to do so is described.
  • Japanese Unexamined Patent Publication No. 2010-272802 Japanese Unexamined Patent Publication No. 2004-118599 Japanese Unexamined Patent Publication No. 2003-132056 Japanese Patent No. 6578941
  • An object of the present invention is to provide a system, a method, and a program that contribute to the efficiency of black-painting work of documents.
  • the black-painted target document determination unit that determines the black-painted target document including the input text, and the correct answer data that specifies one or more documents and the black-painted part in the document are trained.
  • a trained model generator that executes model training as data and generates a trained model, and a black-painted part prediction unit that predicts and outputs the black-painted part of the black-painted target document by the trained model.
  • a black-painted portion display system for a document which has a black-painted portion display unit for displaying the black-painted portion of the black-painted target document.
  • the step of determining the black-painted target document including the input text, and the training of the model using the correct answer data specifying one or more documents and the black-painted part in the document as training data A step of executing and generating a trained model, a prediction step of predicting and outputting a black-painted part of the black-painted target document by the trained model, and a step of displaying the black-painted part of the black-painted target document.
  • a method for displaying a black-painted portion of a document is provided. This method is linked to a specific machine called a computer that has the function of determining the above-mentioned black-painted document, generating a trained model, predicting and outputting the black-painted part, and displaying the black-painted part. ing.
  • a computer including a processor and a storage device is specified with a process of determining a black-painted document including input text, and one or more documents and a black-painted part in the document.
  • a black-painted part display program for a document is provided, which executes a process of displaying the black-painted part of the target document. Note that this program can be recorded on a computer-readable (non-transitional) storage medium. That is, the present invention can also be embodied as a computer program product.
  • a black-painted target document determination unit 10 a trained model generation unit 20, a black-painted portion prediction unit 30, and a black-painted portion display unit 40 are provided. This can be realized by the black-painted part display system 1 of the document to be possessed.
  • the black-painted target document determination unit 10 of the black-painted portion display system 1 of the document determines the black-painted target document including the input input text.
  • the trained model generation unit 20 executes model training using one or a plurality of documents and correct answer data in which black-painted portions in those documents are specified as training data, and generates a trained model. In the learning of the model, the error between the black-painted part predicted by the neural network by inputting one or more documents of the training data and the correct answer data of the black-painted part for each document should be minimized.
  • the neural network weight parameters are adjusted to generate a trained model.
  • the black-painted portion prediction unit 30 predicts and outputs the black-painted portion of the black-painted target document determined by the black-painted target document determination unit 10 using the generated trained model.
  • the black-painted portion display unit 40 displays the predicted black-painted portion of the document to be black-painted.
  • the training data of a plurality of groups having different policies for each predetermined institution, organization or department, for example, each ministry or agency or minister is subjected to model learning, and different learning corresponding to each policy is performed.
  • the neural network may be a deep neural network.
  • the neural network may be RNN (Recurrent Neural Network, recurrent neural network), LSTM (Long Short Term Memory), CNN (Convolutional Neural Network, convolutional neural network), or any combination thereof. good.
  • the document to be blacked out may be a document containing text, a document containing an image, or a document containing both text and an image.
  • the document to be painted in black may be a document acquired by using a voice recognition means.
  • FIG. 2 is a diagram showing the operation of the black-painted portion display unit of the black-painted portion display system of the document according to the embodiment of the present invention.
  • FIG. 2 shows an embodiment of the display screen 50, in which the black-painted target document determined by the black-painted target document determination unit 10 of the black-painted portion display system 1 of the document is displayed in the lower left portion and painted in black.
  • a document displaying the black-painted portion predicted by the black-painted portion predicting unit 30 is displayed in parallel in the lower right portion.
  • the black-painted portion display system of the document As described above, according to the black-painted portion display system of the document according to the embodiment of the present invention, the black-painted portion is output and displayed by using the trained model for the document to be black-painted. Therefore, it is possible to improve the efficiency and labor saving of the black-painting work of documents. In addition, it is possible to recommend a black-painted part of a document that conforms to the black-painting policy of a document such as for each predetermined institution, organization or department, for example, for each ministry or agency or minister.
  • FIG. 3 is a diagram showing a configuration of a black-painted portion display system for a document according to the first embodiment of the present invention.
  • the textual entailment recognition unit 110, the user terminal 111, the document storage unit 112, the trained model generation unit 120, the training data storage unit 121, the document database 130, and the black-painted processing unit 140 A black-painted portion display unit 150, and a configuration including the black-painted portion display unit 150 are shown.
  • the document storage unit 112 stores documents that are candidates for documents to be painted in black.
  • the text implication recognition unit 110 extracts, for each of a plurality of simple sentences included in the input text, a simple sentence having a similar meaning to the simple sentence from the document containing the plurality of simple sentences, and is provided for each of the input text and the document.
  • Discourse-related information indicating the discourse relationship which is the order of occurrence of events between single sentences, is generated based on the appearance order of the simple sentences before and after the connecting word, and the discourse relationship between the single sentences included in the document is based on the discourse-related information.
  • the discourse relation distance which is the number of intersections of the positions between the extracted simple sentences is calculated, and whether or not the document implies the input text based on the value including the discourse relation distance and the predetermined threshold value. It has a text implication recognition function to determine whether or not.
  • Textual entailment recognition is described in Patent Document 4.
  • the textual entailment recognition unit 110 determines a document including the input text input from the user terminal 111 as a black-painted target document from the documents stored in the document storage unit 112.
  • the determined black-painted target document is stored in the document database 130.
  • the trained model generation unit 120 trains the model using the training data stored in the training data storage unit 121, which includes one or a plurality of documents and the correct answer data in which the blackened parts in those documents are specified. To execute.
  • the learning of the model by the trained model generation unit 120 is the same as the operation of the trained model generation unit 20 described in the above-described embodiment.
  • the trained model generated by the trained model generation unit 120 is stored in the document database 130.
  • training data of a plurality of groups having different policies for each predetermined institution, organization or department, for example, for each ministry or agency or ministry is stored in the training data storage unit 121. It is also possible to perform model training for each group of training data, generate different trained models, and store multiple types of trained models with different policies in the document database 130. be.
  • the black-painted processing unit 140 predicts the black-painted portion of the black-painted target document stored in the document database 130 using the trained model stored in the document database 130, determines the black-painted portion, and determines the black-painted portion. Output a black-painted document with black-painted areas.
  • the black-painted processing unit 140 is black-painted among different trained models stored in the document database 130 for a plurality of groups having different policies for each predetermined institution, organization, or department, for example, each ministry, agency, or ministry.
  • the trained model applied to the target document can be used to predict the blackened areas.
  • the black-painted portion display unit 150 displays the black-painted document output by the black-painted processing unit 140.
  • FIG. 4 is a diagram showing the configuration of the textual entailment recognition unit of the black-painted portion display system of the document according to the first embodiment of the present invention.
  • the textual entailment recognition unit 110 includes a textual entailment recognition processing unit 1101 and a black-painted target document extraction / selection unit 1102.
  • the textual entailment recognition processing unit 1101 determines and extracts a document implying the input text from a large number of documents to be determined, which is input from the user terminal 111 and is stored in the document storage unit 112, and is to be painted black. It is sent to the document extraction / selection unit 1102.
  • the black-painted target document extraction / selection unit 1102 presents the extracted document to the user terminal.
  • the document selection is sent from the user terminal to the black-painted target document extraction / selection unit 1102, and according to this selection, the black-painted target document extraction / selection unit 1102 extracts the extracted document into the black-painted target document. To be determined as. The determined black-painted target document is stored in the document database 130.
  • FIG. 5 is a diagram showing a configuration of a trained model generation unit of the black-painted portion display system of the document according to the first embodiment of the present invention.
  • the trained model generation unit 120 has a training data conversion unit 1201 and a model learning unit 1202.
  • the training data conversion unit 1201 executes preprocessing for converting the training data into a format in which the model learning unit 1202 executes model learning.
  • the content of the preprocessing is not limited to this, but includes, for example, converting a word in a black-painted target document into its distributed expression.
  • the model learning unit 1202 executes model learning using the training data input from the training data conversion unit 1201.
  • the learning of the model by the model learning unit 1202 is the same as the operation of the trained model generation unit 20 described in the above-described embodiment.
  • the word distribution expression is a method of expressing the meaning of a word as a high-dimensional real number vector, and is a method of knowing Word2vec, GloVe (Global Vectors for Word Repression), fastText, BERT (Bidirectional Encoder Repression), etc. ing.
  • the training data conversion unit 1201 may convert the words in the black-painted target document by using the distributed expression learned by using the sentences related to the administrative document.
  • FIG. 6 is a diagram showing a configuration of a black-painted portion of a black-painted portion display system of a document according to the first embodiment of the present invention.
  • the black-painted processing unit 140 has a black-painted portion extraction unit 141 and a black-painted document creation unit 142.
  • the black-painted portion extraction unit 141 predicts the black-painted portion of the black-painted target document for the input black-painted target document, and determines the black-painted portion.
  • the black-painted document creation unit 142 creates and outputs a black-painted document having a determined black-painted portion.
  • FIG. 7 is a diagram showing a configuration of a black-painted portion extraction unit of the black-painted portion display system of the document according to the first embodiment of the present invention.
  • the black-painted portion extraction unit 141 has a document data conversion unit 1411 and a black-painted portion prediction unit 1412.
  • a trained model is set in the black-painted portion prediction unit 1412.
  • the document data conversion unit 1411 executes preprocessing for converting a black-painted target document into a format predicted by the black-painted portion prediction unit 1412. In the pre-processing, the processing corresponding to the pre-processing for the training data executed by the training data conversion unit 1201 is also executed for the black-painted target document.
  • the black-painted portion prediction unit 1412 inputs the black-painted target document input from the document data conversion unit 1411 into the trained model, predicts the black-painted portion, determines the black-painted portion, and outputs the black-painted portion.
  • FIG. 8 is a diagram showing a configuration of a black-painted portion display system for a document according to a second embodiment of the present invention.
  • the document black-painted portion display system 200 of the second embodiment includes a document management AI search server 210, a storage unit 220, and a user terminal 230.
  • the document management AI search server 210 includes a prediction unit 211, an acquisition unit 212, a training data generation unit 213, a model learning unit 214, and a black-painted target document extraction / selection unit 215.
  • the black-painted target document extraction / selection unit 215 has a textual entailment recognition function for determining a document implying an input text. The textual entailment recognition is as described above.
  • FIG. 9 is a diagram showing a flowchart for explaining the operation of the black-painted portion display system of the document of the second embodiment of the present invention.
  • the black-painted target document extraction / selection unit 215 receives the search query (input text) from the user terminal 230.
  • the black-painted target document extraction / selection unit 215 extracts a document including the search query (input text) received by the textual entailment recognition function from the storage unit 220.
  • step S30 the black-painted target document extraction / selection unit 215 presents the extracted document to the user terminal.
  • the acquisition unit 212 accepts the selection of the document to be black-painted from the user terminal 230.
  • the prediction unit 211 presents the black-painted portion of the black-painted target document to the user terminal 230 using the trained model.
  • FIG. 10 is a diagram showing a flowchart for explaining the operation of the black-painted portion display system of the document of the second embodiment of the present invention.
  • the flowchart shown in FIG. 10 is a diagram illustrating the operation of step S50 of the flowchart shown in FIG. 9 in more detail.
  • the acquisition unit 212 acquires training data having document data and correct answer data (designated blackened portion) from the user terminal.
  • the training data generation unit 213 generates training data to be used for training in the model learning unit 214.
  • the training data generation unit 213 can also generate training data for each of a plurality of groups for each predetermined institution, organization, or department, for example, for each ministry, agency, or minister.
  • the model learning unit 214 trains the model based on the training data to generate a trained model, and stores the trained model in the storage unit 220.
  • the error between the black-painted portion predicted by the neural network by inputting one or a plurality of documents of the training data and the correct answer data of the black-painted portion for each document.
  • the neural network weight parameters are adjusted to minimize, and a trained model is generated.
  • the model learning unit 214 executes model learning based on the training data for each of the plurality of groups for each ministry, agency, or minister generated by the training data generation unit 213, and the model learning unit 214 executes model learning based on the training data for each of the plurality of ceremonies, agencies, or ministers.
  • Each trained model for the group may be generated and stored in the storage unit 220.
  • step S540 in step S40 of FIG. 9, the black-painted target document, which is the document to be predicted selected from the user terminal 230 via the acquisition unit 212, is read from the storage unit 220 to the prediction unit 211.
  • step S550 the acquisition unit 212 receives the designation of the trained model to be used for prediction from the user terminal 230.
  • the acquisition unit 212 specifies a trained model to be used for the prediction unit 211, and the prediction unit 211 reads the trained model to be used for prediction from the storage unit 220.
  • the prediction unit 211 predicts the black-painted portion of the read black-painted target document based on the read learned model, and presents it to the user terminal 230.
  • FIG. 11 is a diagram showing a configuration of a black-painted portion display system for a document according to a third embodiment of the present invention.
  • the components having the same numbers as those in FIG. 3 indicate the same components.
  • the black-painted portion change / reason display / list display reception unit 160 is added to the configuration of the black-painted portion display system of the document of the first embodiment of the present invention described in FIG. This is the embodiment. Further, FIG.
  • the black-painted processing unit 140 of the third embodiment of FIG. 12 receives a black-painted part change instruction output from the black-painted part change / reason display / list display reception unit 160 in the black-painted document creation unit 142, and this instruction is given.
  • the black-painted document creation unit 142 executes a process such as deleting the black-painted portion determined by the black-painted portion extraction unit 141 or setting a black-painted portion in another portion. According to the third embodiment of the present invention, it is possible to change the determined black-painted portion of the black-painted portion extraction unit 141.
  • FIG. 13 is a diagram showing a configuration of a black-painted portion display system for a document according to a fourth embodiment of the present invention.
  • the components having the same numbers as those in FIG. 11 indicate the same components.
  • the fourth embodiment of the present invention is an embodiment in which the black-painted portion change history storage unit 170 is added to the configuration of the black-painted portion display system of the document of the third embodiment of the present invention described in FIG. ..
  • FIG. 14 is a diagram showing a configuration of a black-painted portion display system for a document according to a fourth embodiment of the present invention and a black-painted portion change history storage unit.
  • FIG. 13 is a diagram showing a configuration of a black-painted portion display system for a document according to a fourth embodiment of the present invention and a black-painted portion change history storage unit.
  • the black-painted processing unit 140 of the fourth embodiment of FIG. 14 receives an instruction output from the black-painted portion change / reason display / list display reception unit 160 in the black-painted document creation unit 142, and is black-painted according to this instruction.
  • the document creation unit 142 can change the black-painted portion determined by the black-painted portion extraction unit 141.
  • the black-painted portion change history storage unit 170 accumulates this change history together with the black-painted portion target document.
  • the change history of the black-painted portion accumulated in this way can be used as training data when the trained model generation unit 120 generates a trained model when a certain number of change histories are accumulated. can.
  • FIG. 15 is a diagram showing a configuration of a black-painted portion display system for a document according to a fifth embodiment of the present invention.
  • the components having the same numbers as those in FIG. 13 indicate the same components.
  • the black-painted portion display system of the document of the fourth embodiment of the present invention described in FIG. 13 from the black-painted portion change history storage unit 170 to the training data storage unit 121. It is an embodiment in which the connection of is added. The differences from the fourth embodiment will be mainly described below.
  • the black-painted portion change history storage unit 170 of the fifth embodiment of FIG. Accumulate this change history.
  • the black-painted target document accumulated from the black-painted part change history storage unit 170 to the training data storage unit 121.
  • the change history of the accumulated black-painted part is sent.
  • the training data storage unit 121 stores these black-painted target documents sent and the change history of the black-painted portion as retraining data, and the trained model generation unit 120 uses the accumulated retraining data.
  • the trained model can be regenerated by re-training the model.
  • FIG. 16 shows the configuration of the black coating processing unit according to the sixth embodiment of the present invention, and the configuration of the black coating processing unit shown in FIG. 12 of the third embodiment of the present invention has been changed. It is a thing.
  • the black-painted processing unit 140 of the sixth embodiment of the present invention shown in FIG. 16 has a black-painted document creation unit 142 and a black-painted portion reason display unit 1421.
  • FIG. 17 is a diagram showing an example of display of a black-painted portion of a document black-painted portion display system according to a sixth embodiment of the present invention. In FIG. 17, the components assigned the same numbers as those in FIG. 2 indicate the same components.
  • the trained model used to predict the black-painted part of the black-painted document is the training data for each of a plurality of groups for each predetermined institution, organization or department, for example, each ministry or agency or ministry. Since it is generated using, each trained model corresponds to its own policy.
  • the black-painted part reason display unit 1421 receives and holds the black-painted part reason corresponding to the policy related to the extracted black-painted part and the training data when the trained model is generated from the black-painted part extraction unit 141.
  • a pointing device such as a mouse
  • the black-painted portion is painted black.
  • the location change / reason display / list display reception unit 160 accepts the black-painted location designation and sends it to the black-painted document creation unit 142.
  • the black-painted reason corresponding to each black-painted part held by the black-painted part reason display unit 1421 is sent to the black-painted part display unit 150 together with the black-painted document, and the black-painted part display unit The 150 displays the black-painted reason 51 on the display screen 50.
  • FIG. 18 shows the configuration of the black coating processing unit according to the seventh embodiment of the present invention, and is a modification of the configuration of the black coating processing unit shown in FIG. 12 of the third embodiment of the present invention.
  • the black-painted processing unit 140 of the seventh embodiment of the present invention shown in FIG. 18 has a black-painted document creation unit 142 and a black-painted portion list display unit 1422.
  • FIG. 19 is a diagram showing an example of a list display of black-painted portions in the black-painted portion display system of the document according to the seventh embodiment of the present invention.
  • the black-painted portion number, the page / line of the black-painted portion, the black-painted policy name, and the reason for the policy registration may be displayed.
  • the trained model used to predict the black-painted part of the black-painted document is the training data for each of a plurality of groups for each predetermined institution, organization or department, for example, each ministry or agency or ministry. Since it is generated using, each trained model corresponds to its own policy.
  • the black-painted part list display unit 1422 registered the black-painted part name and the policy corresponding to the policy related to the training data when the black-painted part extracted and the trained model were generated from the black-painted part extraction unit 141. You may receive the reason for the action and keep it in a list format.
  • the list display specification is sent to the black-painted part list display unit 1422 and held in the black-painted part list display unit 1422.
  • a list of black-painted parts corresponding to each black-painted part is sent to the black-painted part display unit 150, and the black-painted part list is displayed.
  • the black-painted part display system 1 of the document is mounted on the computer (9000 in FIG. 20) functioning as the black-painted part display system 1, 100, 200 of the document.
  • 100, 200 can be realized by a program that realizes the functions.
  • a computer is exemplified in a configuration including a CPU (Central Processing Unit) 9010, a communication interface 9020, a memory 9030, and an auxiliary storage device 9040 in FIG. 20. That is, the CPU 9010 in FIG. 20 may execute the black-painted portion display program and update each calculation parameter held in the auxiliary storage device 9040 or the like.
  • a CPU Central Processing Unit
  • each part (processing means, function) of the black-painted portion display system of the document shown in the first to seventh embodiments described above uses the hardware of the computer processor to perform the above-mentioned processing. It can be realized by a computer program to be executed.
  • the trained model generation unit of the black-painted portion display system of the above-mentioned document executes the learning of the model by using the neural network.
  • the neural network of the black-painted part display system of the above-mentioned document is preferably a deep neural network.
  • the neural network of the black-painted part display system of the above-mentioned document is RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), CNN (Convolutional Neural Network, convolutional neural network), or theirs. It is preferably a combination.
  • the black-painted target document determination unit of the black-painted part display system of the above-mentioned document extracts a single sentence having a similar meaning to the single sentence for each of a plurality of simple sentences included in the input text from the documents containing the multiple simple sentences.
  • discourse-related information indicating the discourse relationship which is the order of occurrence of events between the single sentences, is generated based on the appearance order of the simple sentences before and after a certain connection word, and based on the discourse-related information.
  • the discourse relationship between the single sentences included in the document and the discourse relationship distance which is the number of intersections of the positions between the extracted single sentences are calculated, and based on the value including the discourse relationship distance and a predetermined threshold value.
  • a black-painted part change receiving unit that accepts a change in the display of the black-painted part of the document to be black-painted in the black-painted part display system of the above-mentioned document.
  • the document to be black-painted in the black-painted part display system of the above-mentioned document is a document containing text.
  • the document to be black-painted in the black-painted part display system of the above-mentioned document can be a document including an image.
  • the document to be black-painted in the black-painted part display system of the above-mentioned document can be a document acquired by the voice recognition means.
  • the black-painted document creation unit changes the black-painted portion determined by the black-painted portion extraction unit according to the input received by the black-painted portion change reception unit of the black-painted portion display system of the above-mentioned document.
  • a black-painted portion change history storage unit that stores changes in the display of the black-painted portion of the above-mentioned document as a black-painted portion display system as a change history.
  • the trained model generation unit of the black-painted part display system of the above document re-executes model learning using the retraining data that uses the history information accumulated in the black-painted part change history storage unit as correct answer data. Is preferable.
  • the above-mentioned black-painted part reason display receiving unit can accept an input for designating a black-painted part of a document to be black-painted.
  • the black-painted part display unit of the black-painted part display system of the above-mentioned document displays the reason for the black-painted part held by the black-painted part reason display unit according to the input received by the black-painted part reason display receiving unit. be able to.
  • the black-painted portion list display receiving unit can accept an input specifying that the black-painted portion is displayed in a list.
  • the black-painted part display unit of the black-painted part display system of the above-mentioned document displays a list of black-painted parts held in the black-painted part list display unit according to the input received by the black-painted part list display receiving unit. be able to.
  • the black-painted part list of the above-mentioned document black-painted part display system can display the black-painted part, the page / line where the black-painted appearance appears, the policy name associated with each part, and the reason for the policy registration. can.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
PCT/JP2020/021904 2020-06-03 2020-06-03 文書の黒塗り箇所表示システム、方法、プログラム Ceased WO2021245833A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022529216A JP7513089B2 (ja) 2020-06-03 2020-06-03 文書の黒塗り箇所表示システム、方法、プログラム
PCT/JP2020/021904 WO2021245833A1 (ja) 2020-06-03 2020-06-03 文書の黒塗り箇所表示システム、方法、プログラム
US18/007,761 US20230334164A1 (en) 2020-06-03 2020-06-03 Document redacted part displaying system, document redacted part displaying method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/021904 WO2021245833A1 (ja) 2020-06-03 2020-06-03 文書の黒塗り箇所表示システム、方法、プログラム

Publications (1)

Publication Number Publication Date
WO2021245833A1 true WO2021245833A1 (ja) 2021-12-09

Family

ID=78830996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/021904 Ceased WO2021245833A1 (ja) 2020-06-03 2020-06-03 文書の黒塗り箇所表示システム、方法、プログラム

Country Status (3)

Country Link
US (1) US20230334164A1 (https=)
JP (1) JP7513089B2 (https=)
WO (1) WO2021245833A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954082B1 (en) * 2023-01-03 2024-04-09 Truist Bank User definable alternate display of log entries

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145529A (ja) * 2002-10-23 2004-05-20 Hitachi Ltd 開示文書マスキング部管理方法及びその実施装置並びにその処理プログラム
JP2005338903A (ja) * 2004-05-24 2005-12-08 Fujitsu Ltd 文書開示方法、プログラム及び装置
US20190018983A1 (en) * 2017-07-17 2019-01-17 Microsoft Technology Licensing, Llc Removing Sensitive Content from Documents while Preserving their Usefulness for Subsequent Processing
JP6578941B2 (ja) * 2013-02-28 2019-09-25 日本電気株式会社 含意判定装置、含意判定方法及びプログラム

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5460359B2 (ja) * 2010-01-29 2014-04-02 インターナショナル・ビジネス・マシーンズ・コーポレーション 文書中の文字列の処理を支援するための装置、方法及びプログラム
US8904554B2 (en) * 2010-03-30 2014-12-02 Private Access, Inc. System and method for selectively redacting information in electronic documents
US9195853B2 (en) * 2012-01-15 2015-11-24 International Business Machines Corporation Automated document redaction
US10083320B2 (en) * 2015-06-24 2018-09-25 Airwatch Llc Dynamic content redaction
US11127403B2 (en) * 2019-10-25 2021-09-21 Intuit Inc. Machine learning-based automatic detection and removal of personally identifiable information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145529A (ja) * 2002-10-23 2004-05-20 Hitachi Ltd 開示文書マスキング部管理方法及びその実施装置並びにその処理プログラム
JP2005338903A (ja) * 2004-05-24 2005-12-08 Fujitsu Ltd 文書開示方法、プログラム及び装置
JP6578941B2 (ja) * 2013-02-28 2019-09-25 日本電気株式会社 含意判定装置、含意判定方法及びプログラム
US20190018983A1 (en) * 2017-07-17 2019-01-17 Microsoft Technology Licensing, Llc Removing Sensitive Content from Documents while Preserving their Usefulness for Subsequent Processing

Also Published As

Publication number Publication date
JP7513089B2 (ja) 2024-07-09
JPWO2021245833A1 (https=) 2021-12-09
US20230334164A1 (en) 2023-10-19

Similar Documents

Publication Publication Date Title
CN118093801B (zh) 基于大语言模型的信息交互方法、装置以及电子设备
CN110825882B (zh) 一种基于知识图谱的信息系统管理方法
US10783405B2 (en) Refinement of machine learning engines for automatically generating component-based user interfaces
US10928982B2 (en) Automatic grouping of user interface elements into components
CN109918568B (zh) 个性化学习方法、装置、电子设备及存储介质
CN108305050B (zh) 报案信息及服务需求信息的提取方法、装置、设备及介质
US20200192921A1 (en) Suggesting text in an electronic document
CN111098312A (zh) 窗口政务服务机器人
CN118364916A (zh) 一种基于大语言模型和知识图谱的新闻检索方法及系统
JP7566196B1 (ja) 求職支援システム、求職支援方法及びプログラム
CN113627797B (zh) 入职员工画像生成方法、装置、计算机设备及存储介质
JP2023017983A (ja) 情報生成モデルの訓練方法、情報生成方法、装置、電子機器、記憶媒体およびコンピュータプログラム
US20250245486A1 (en) Artificially intelligent assistant for work protocols
CN110991988A (zh) 基于岗位信息文档的目标简历文件筛选方法和装置
JP2020102193A (ja) 文章変換システム、文章変換方法、及びプログラム
CN119739834A (zh) 一种提示词的生成方法、装置、设备、介质和程序产品
JP2022020543A (ja) 技能用語評定方法および装置、電子機器、コンピュータ読み取り可能な媒体
Tian Application and analysis of artificial intelligence graphic element algorithm in digital media art design
CN119850149A (zh) 一种航空企业标准化管理系统
CN118295698A (zh) 设计文档的生成方法、装置、电子设备和存储介质
CN120746508B (zh) 基于大模型多智能体协同的经济指标可视化编排方法
WO2021245833A1 (ja) 文書の黒塗り箇所表示システム、方法、プログラム
US20250245205A1 (en) Techniques for optimizing project data storage
CN120804379A (zh) 一种基于路径相关性的动态知识融合图像问答方法
JP7521583B2 (ja) 穴埋め試験問題作成システム、方法、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20939468

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 2022529216

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20939468

Country of ref document: EP

Kind code of ref document: A1