WO2020175662A1 - Dictionary creating device, dictionary creating method, and dictionary creating program - Google Patents

Dictionary creating device, dictionary creating method, and dictionary creating program Download PDF

Info

Publication number
WO2020175662A1
WO2020175662A1 PCT/JP2020/008190 JP2020008190W WO2020175662A1 WO 2020175662 A1 WO2020175662 A1 WO 2020175662A1 JP 2020008190 W JP2020008190 W JP 2020008190W WO 2020175662 A1 WO2020175662 A1 WO 2020175662A1
Authority
WO
WIPO (PCT)
Prior art keywords
dictionary
words
common word
item
synonym
Prior art date
Application number
PCT/JP2020/008190
Other languages
French (fr)
Japanese (ja)
Inventor
一也 谷川
Original Assignee
株式会社ミラボ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ミラボ filed Critical 株式会社ミラボ
Publication of WO2020175662A1 publication Critical patent/WO2020175662A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • the present invention relates to a dictionary creating device, a dictionary creating method, and a dictionary creating program, and in particular, creating a dictionary that creates a synonym dictionary and/or a synonym dictionary for a word in an item name used in a form.
  • the present invention relates to a device, a dictionary creating method, and a dictionary creating program.
  • the form is a paper medium, but it is desired to reduce the management cost of the form by using an input form that is an electronic form of the paper medium.
  • Patent Document 1 discloses a system that determines the type of a form and uses the input form according to the type of the form to perform the acceptance processing of the form.
  • Patent Document 1 Japanese Patent Laid-Open No. 20 0 4 _ 1 2 6 9 10
  • the corresponding item names may differ depending on the local government or company. Therefore, when trying to standardize item names for many types of forms, there was a problem that the list of item names would be huge and it would be extremely labor intensive to organize them manually. Therefore, it is desirable to set a standard item name for the item name that is used as the same meaning in multiple forms, but to further improve the standardization accuracy of the item name, it is included in the item name.
  • the words are synonymous with each other ⁇ 02020/175662 2 ⁇ (: 171?2020/008190
  • the present invention has been made in view of the above problems, and an object thereof is whether words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Another object of the present invention is to provide a synonym dictionary for determining whether or not a dictionary, a dictionary creation device for creating a synonym dictionary, a dictionary creation method, and a dictionary creation program. Means for solving the problem
  • the above problem is a dictionary creating device for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form, which is described in a plurality of forms.
  • an item name acquisition unit that acquires a plurality of item names, and one or more words included in each of the plurality of item names acquired by the item name acquisition unit are classified based on a predetermined condition. For each common word group, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the first processing unit that creates the common word group and the information that identifies the form. And a second processing unit that does.
  • a synonym dictionary can be created.
  • the first processing unit may classify words other than the common word of item names including words common to a plurality of the item names into the same common word group. ..
  • the second processing unit determines that the words are synonymous when the words in one common word group are not used in the same form. Good to do.
  • the item name acquisition unit acquires, for each item name, form identification information that identifies a form in which the acquired item name is described, and the common word group is common.
  • the word group storage unit has a word belonging to the common word group and form identification information of a form in which the word is described for each word, and the second processing unit is a processing target.
  • the second processing unit determines the words to be synonyms when the words to be processed have common form identification information.
  • the above problem is a dictionary creating method by a dictionary creating apparatus for creating at least one of a synonym dictionary and a synonym dictionary, wherein the dictionary creating apparatus comprises: An item name acquisition step of acquiring a plurality of item names described in a plurality of forms, and one or a plurality of words included in each of the plurality of item names acquired in the item name acquisition step based on a predetermined condition. Based on the first processing step of classifying and creating one or more common word groups, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the information identifying the form. The second processing step of discriminating for each word group is provided to solve the above problem.
  • the above-mentioned problem is a dictionary creation program for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form.
  • An item name acquisition unit that acquires a plurality of listed item names and one or more words contained in each of the plurality of item names acquired by the item name acquisition unit are classified based on predetermined conditions.
  • the first processing unit that creates one or more common word groups and whether the words in the common word groups are synonymous or synonymous with each other is determined based on the information that identifies the form. It is solved by making it function as the second processing unit that determines for each group.
  • a synonym dictionary and a synonym dictionary for determining whether the words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Can be created.
  • FIG. 1 is a diagram showing an overall configuration of an information processing system.
  • Fig. 2 is a diagram for explaining the outline of the synonym dictionary creation process.
  • FIG. 3 is a functional block diagram of the dictionary creation device.
  • FIG. 4 A flow chart of dictionary creation processing.
  • FIG. 5 A flow chart of dictionary creation processing.
  • FIGS. 1 to 5 a dictionary creation device 10 according to an embodiment of the present invention (hereinafter, referred to as the present embodiment) will be described with reference to FIGS. 1 to 5.
  • the “form” is a paper medium or electronic medium that can be used to enter information and that is subjected to a prescribed process (procedure).
  • a “form” is used to apply for a local government such as a municipality, a country, or a private company. Specifically, birth notification, pregnancy notification, etc. correspond to an example of a “form”.
  • the “item name” is a component of the form and is information for defining the content and format of the input information to the form. For example, "child's name”, “child's date of birth”, etc. correspond to an example of the above "item name”.
  • “Synonyms” are synonyms when two or more different words have the same meaning as each other, especially when they are used as words showing the same attribute in a form item.
  • “Synonyms” are synonyms when two or more different words have different meanings, especially when they are used as words that indicate different attributes in a form item.
  • a “synonym dictionary” is a collection of data that has information that allows determining that two or more words are synonyms for each other. For example, if “child” and “child” are synonymous with “name” and “name”, it is possible to determine that these words have a synonymous relationship by referring to the synonym dictionary. ..
  • a “synonym dictionary” is a collection of data that has information that enables two or more words to be synonymous with each other. For example, if “child” and “mother” are synonyms, and “name” and “date of birth” are synonyms, it is possible to determine that these terms are synonymous by referring to the synonym dictionary. Is.
  • “synonyms” and “synonyms” are collectively referred to as “synonyms”, and “synonyms dictionary” and “synonyms dictionary” are also referred to as “synonyms”.
  • “Synonym dictionary” is a data collection of the above “synonym dictionary”. ⁇ 02020/175662 6 ⁇ (: 171?2020/008190
  • the information processing system 1 includes a synonym dictionary creating device 10 (hereinafter referred to as “dictionary creating device 10 ”) and a form processing device 30.
  • the dictionary creating device 10 and the form processing device 30 are communicably connected via a network such as an internet or an intranet (not shown).
  • the form processing device 30 is connected to the scanner 40.
  • the scanner 40 is a device that captures image information by optically scanning a paper medium.
  • the scanner 40 outputs a scan image (image information) obtained by scanning the form to the form processing device 30.
  • the form processing device 30 is a computer that processes the form captured by the scanner 40. Specifically, the form processing device 30 executes XX 8 (optical character recognition) on the form to obtain the character string described in the form. In addition, the form processing device 30 determines whether or not the form? The table structure of is analyzed. More specifically, the form processing apparatus 30 divides the form into item columns, input columns, and fill-in input columns, and analyzes information on item names described in the item column (and fill-in input column). ..
  • the item column is an area in which a character string as an item name is written
  • the input column is an area in which no character string is written and the information corresponding to the item column is input.
  • a character string is described, and information is entered between the character strings. This is the area to
  • An input device 3 1 is connected to the form processing device 30 and information can be input via the input device 3 1.
  • a display device 32 is connected to the form processing device 30, and a U screen or the like can be displayed on the display device 32.
  • information on a plurality of types of forms P analyzed by the form processing device 30 is output to the dictionary creating device 10. Then, the dictionary creating device 10 creates a synonym dictionary and a synonym dictionary for determining whether the words in the item names used in multiple types of forms P are synonyms or synonyms. To do.
  • the dictionary creating device 10 is a computer that includes a processor 11, a storage device 12 and a communication interface 13 as hardware.
  • the processor 11 is configured to include, for example, a central processing unit (Central Processing Unit), and executes various arithmetic processes based on the programs and data stored in the storage device 12, and the dictionary. Controls each part of the creation device 10.
  • a central processing unit Central Processing Unit
  • the processor 11 executes various arithmetic processes based on the programs and data stored in the storage device 12, and the dictionary. Controls each part of the creation device 10.
  • the storage device 12 is configured to include, for example, a memory and a magnetic disk device, stores various programs and data, and also functions as a work memory for the processor 11.
  • the communication interface has a communication interface such as a network interface card (N IC) and is connected to the network via the communication interface. Then, the communication interface communicates with a device such as the form processing device 30 via the network.
  • N IC network interface card
  • the dictionary creation device 10 is configured to manage a plurality of procedures related to various procedures. ⁇ 02020/175662 8 ⁇ (: 171?2020/008190
  • Acquire form group ⁇ consisting of forms. Multiple document contains a form on the same _ procedure that is needed use in more than one municipality. Even in the same _ procedure, when the municipality is different, because the item name, which is the form and use of the form is different, contains each of the book form to the form group ⁇ .
  • each form includes one or more item names such as “8”, “Mimi”, and “0”.
  • Item name ⁇ is a phrase that contains one or more words.
  • each item name includes a form account that can identify the form.
  • the dictionary creating device 10 extracts the item name item from each form.
  • the procedure identification information, the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
  • the dictionary creating device 10 extracts the item name item from each form.
  • the procedure identification information, the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
  • the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
  • the dictionary creating device 10 extracts the item name item from each form.
  • the procedure identification information the form identification information, which is the identification information of the procedure mouth, the form I mouth, etc.
  • the entire item name ⁇ extracted from the forms included in the form group ⁇ is referred to as an item name group ⁇ .
  • the dictionary creating device 10 classifies the item name I included in the item name group ⁇ into a common word group (first process: common word group creating process).
  • the dictionary creation device 10 acquires one procedure (procedure) to be processed, and in the item name I included in the item name group ⁇ , the item name I belonging to the procedure is included in the item name.
  • the dictionary creating device 10 extracts a noun from the words (morphemes) obtained by decomposing the item names ⁇ 1 and ⁇ 2 by morphological analysis.
  • words morphemes
  • nouns extracted by morphological analysis are called "words”.
  • the common word group includes a form entry corresponding to each word.
  • the dictionary creating device 10 carries out the first process for all item names I belonging to the procedure to be processed, and creates a common word group for the item names of procedure 8. Then, this process is repeated for each procedure to create a common word group for all procedures.
  • the procedure 8 to be processed can be input by the user and acquired from the input.
  • the dictionary creation device 10 may extract and process only the procedure to be processed from the procedure I or the like of the item name group.
  • the dictionary creation device 10 determines whether the words in the group are likely to be synonyms or synonyms for each of the common word groups created in the first process. Is determined to be high, and synonym candidates and synonym candidates are created (second process; synonym candidate creation process).
  • the dictionary creating apparatus 10 determines whether or not words to be processed are used in the same form using a form entry. When the words are used in the same form, the dictionary creation device 10 determines that there is a high possibility that they are “synonyms”, and the synonym candidates are candidates for synonyms. Update the memory. On the other hand, when words are not used in the same form, it is determined that they are likely to be “synonyms”, and the synonym candidate storage unit is updated as a synonym candidate.
  • the dictionary creation device 10 presents the synonym dictionary candidates created in the second process to the user and accepts the approval input. Specifically, the dictionary creating device 10 causes the display unit provided in the dictionary creating device 10 or a display device or the like connected via a communication line to display the information of the synonym dictionary candidates. Then, it receives an input from an input device connected directly or via a communication line.
  • the dictionary creation device 10 accepts approval input from the user, reflects approval/rejection information for each candidate from the synonym candidate, and creates a final synonym dictionary. , Update (Synonym dictionary update process).
  • the synonym candidate is created, and the approval/disapproval of the candidate is accepted to determine the final synonym dictionary.
  • the synonym candidates created may be fixed as the synonym dictionary as they are.
  • the dictionary creation device 10 determines whether the item name ⁇ acquired from a plurality of forms belonging to a procedure is a synonym or a synonym, and creates a synonym dictionary. create. It should be noted that the same-objection language dictionary that is created is one that can take advantage of the different item names of different form in which a plurality of local governments Te same _ procedure smell is using common, at the time of standardization.
  • the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
  • FIG. 3 shows a functional block diagram of the dictionary creating device 10.
  • the dictionary creation device 10 has, as functions, an item name storage unit 20, a common word group storage unit 20, and a synonym candidate storage unit 20 ( 3, a synonym).
  • the display section 210, the reception section 21 and the update section 21 are provided.
  • the functions of the above-mentioned respective units provided in the dictionary creating device 10 are achieved by the processor 11 operating each unit of the dictionary creating device 10 according to a program (dictionary creating program) stored in the storage unit 12. To be executed.
  • the above program may be acquired by the dictionary creation device 10 via a communication network such as a network through a communication interface, or the dictionary creation device 10 reads it from the storage medium storing the program. It may be acquired at.
  • processor 11 of the dictionary creating apparatus 10 operates according to the dictionary creating program to implement the dictionary creating method according to the present invention. The details of the functions of the above units will be described below.
  • the item name storage unit 20 is a dictionary creation device. The information of the item name extracted from the form included in is stored. The item name storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
  • the item name storage unit 20 is realized by an item name table (not shown) stored in the storage device 12.
  • the item name table stores, for each item name, the item name, the form identification information of the form in which the item name is extracted, and the procedure identification information to which the form belongs.
  • the form identification information and the procedure identification information are, for example, a form account and a procedure I account. Even if the form identification information is a form used in the same procedure, different form identification information is given to different users such as local governments, countries and companies that use the form.
  • the common word group storage unit 20 stores information of one or more common word groups created by the dictionary creating device 10.
  • the common word group storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
  • the common word group storage unit 20 is realized by a common word group table (not shown) stored in the storage device 12.
  • the common word group table stores, for example, common word names, words, and form identification information of forms. ⁇ 02020/175662 12 ((171?2020/008190
  • a common word name is one in a common word group. For example, in the “name” group, the common word is “name”.
  • the word is a word that is a member of the common word group. For example, when the process of classifying the item name "child's name” into the "name” group is performed in the first process, the word is paired with the common word. , That is, "children" who formed the item name together with the common word.
  • the form identification information is stored for each word, and the form identification information of the item name storage unit 20 is the same. If one word is used in multiple forms, multiple form identification information is stored for one word.
  • Synonym candidate storage unit 20 shows information created by the dictionary creation device 10 that can identify synonym candidate words and information that can identify synonym candidate words. Stores the data (not shown) containing the same.
  • Synonym candidate storage unit 20 (3 is mainly realized by the storage unit 12 of the dictionary creating device 10.
  • the synonym candidate storage unit 20 (3 As an example, stores the same contents as the synonym dictionary storage unit 200 described below.
  • the synonym dictionary storage unit 200 is realized by the synonym dictionary table (not shown) stored in the storage device 12.
  • the synonym dictionary storage unit 200 stores the synonym dictionary data (not shown) including information that allows the synonymous words created by the dictionary creating device 10 to be identified, and the synonymous words. Stores the synonym dictionary data (not shown) that contains identifiable information.
  • the synonym dictionary storage unit 200 is realized mainly by the storage device 12 of the dictionary creating device 10.
  • the synonym dictionary storage unit 200 stores, for example, synonyms, procedures, and synonyms of word 1, word 2, word 1 and word 2.
  • synonyms for word 1 and word 2, for example, “synonyms”, “synonyms”, “intra-procedure synonyms”, “intra-procedure synonyms”, etc. are stored according to the discrimination or approval result. ⁇ 02020/175662 13 ⁇ (: 171?2020/008190
  • the item name acquisition unit 21 executes the above-mentioned item name acquisition process to acquire a plurality of item names described in a plurality of forms.
  • the item name acquisition unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
  • the process executed by the item name acquisition unit 21 corresponds to the item name acquisition process.
  • the processor 11 acquires the analysis results of a plurality of forms to be processed from the form processing device 30 via the communication interface 13.
  • the analysis results of a plurality of forms include character string data of one or more item names obtained by optical character recognition from the forms, procedure identification information, and form identification information.
  • the item name acquisition unit 2 18 acquires a plurality of item names described in a plurality of forms used by different local governments for the same procedure.
  • the procedure identification information such as Procedure I 0, Form I, etc.
  • the form identification information which can identify from which form belonging to which procedure the item name is extracted, are acquired together with the item name. ..
  • the procedure I 0 and the form I 0 can acquire the information input by the user when importing the form.
  • the item name acquisition unit 21 may acquire image data of a plurality of forms from the form processing device 30 and may obtain character string data of item names from the acquired images based on predetermined image processing.
  • the first processing unit 21 1 executes the above-described first process, and selects one or more words contained in each of the plurality of item names acquired by the item name acquisition unit 2 18 as one or more words. Classify into common word groups and create common word groups.
  • the first processing unit 21 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10.
  • the processing executed by the first processing unit 21 1 corresponds to the first processing step. ⁇ 0 2020/175662 14 ⁇ (: 171? 2020 /008190
  • the first processing unit 21 1 is used in a pair (both) with a word other than the common word of the item names including the word common to the plurality of item names, that is, the common word.
  • the words that make up one item name are grouped together for each common word.
  • the second processing unit 210 executes the above-mentioned second processing, and for each of the common word groups created in the first processing, is it highly possible that each word in the group is a synonym? Determine whether there is a high probability of synonyms and create synonym candidates and synonym candidates (synonym candidates).
  • the second processing unit 21 (3 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10).
  • the processing executed by the second processing unit 210 corresponds to the second processing step.
  • the second processing unit 21 determines whether the words are synonymous or synonymous based on the form identification information that is information for specifying the form. If they do not have common form identification information, the words are distinguished as synonyms, and if the words to be processed have common form identification information, the words are not synonymous. Distinguish as a word.
  • the presentation unit 210 displays the synonym candidates created in the second process on the display device 32 and presents them.
  • the presentation unit 210 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
  • the processor 11 selects the synonym and/or synonym candidates stored in the synonym candidate storage unit via the communication interface 13 to form processing device 30. And display it on the display device 3 2 of the form processing unit 30. ⁇ 02020/175662 15 ⁇ (: 171?2020/008190
  • the processor 11 may not perform the process of transmitting it to the form processing device 30 but may display it on the display device attached to the document creation device.
  • the accepting unit 21 accepts information such as approval or rejection of the synonym candidates input by the user from the form processing apparatus 30.
  • the processor 11 receives input of information from the form processing device 30 via the communication interface 13.
  • the reception unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
  • the updating unit 21 reflects the approval/rejection information received by the accepting unit 21 1 to the data of the synonym candidates created by the second processing unit 2 1 ⁇ 3, and finally updates the data. Create or update a synonym dictionary.
  • the updating unit 21 is realized mainly by the processor 11 and the storage device 12 of the dictionary creating device 10.
  • the process executed by the updating unit 21 corresponds to the dictionary creating/updating process.
  • the dictionary creation device 10 initializes 3 indicating the number of procedures to 1 (3 1), and selects _ Select procedure 8 (32). The selection of the procedure may be executed by receiving the input from the user.
  • the dictionary creation device 10 initializes the variable ⁇ to 1 (3 3) and acquires the item name I belonging to the selected procedure 3 (3 4) and morphologically analyzes the item. Extract the nouns included in the name, To get (3 5). Next, the dictionary creating device 10 selects the item name I + 111 belonging to the procedure 3 (36), and similarly ⁇ 02020/175662 16 ⁇ (: 171?2020/008190
  • the dictionary creation device 10 extracts the extracted words! ⁇ And the word 1 ⁇ [[ 3 are compared to determine whether there is a common word (38). When there is no common word (3 8; N 0), the process ends. On the other hand, when there is a common word (3 8; ⁇ 6 3), it searches the common word group storage section 20 (3 9) whether the common word group ⁇ ⁇ of the common word has already been created.
  • the words ⁇ and the words 1 to ⁇ [ 3 , and the forms of each word are stored in the common word group (3 10).
  • the dictionary creation device 10 creates a new common word group ⁇ ⁇ and adds the word! ⁇ And word 1 ⁇ [ 3 , and the form of each word is stored in the common word group ⁇ ⁇ (3 1 1).
  • the dictionary creating device 10 determines whether or not the item name ⁇
  • the dictionary creation device 10 determines whether or not all the procedures 3 of the plurality of procedures have been processed at 316. If processing for all procedures is not completed, proceed to 317 and add 1 to 3. If the processes for all the procedures have been completed, the process ends.
  • the dictionary creating device 10 executes the process shown in FIG. 5 for each of the common word groups created as described above.
  • the dictionary creating device 10 initializes the variable 3 and the variable ! ⁇ (3 2 1) and acquires the procedure 3 (3 2 2).
  • the dictionary creation device 10 selects the common word group ⁇ ! ⁇ (3 2 3). Then I ⁇ 02020/175662 17 ⁇ (: 171?2020/008190
  • dictionary creating apparatus 1 the calculated number of counts is determined whether 0 (zero) or greater than (3 2 7), when greater than 0 ⁇ 2 1; ⁇ 6 3), their single It is determined that the word is a synonym, and it is written as a synonym in the same-synonym candidate storage unit (3 2 8), and the process proceeds to 3 30. On the other hand, when the count number is 0 (3 2 7 ;1 ⁇ 100)
  • the dictionary creation device 10 judges whether or not the word I is the last word (3 30), and when the processing for all the words I is not completed, (3 3 0 ;1 ⁇ 1 ⁇ ), add 1 to ⁇ (3 3 1), and proceed to 3 2 5.
  • 3 3 0 ;1 ⁇ 1 ⁇ add 1 to ⁇ (3 3 1), and proceed to 3 2 5.
  • 3 3 2 ; 6 Go to 3 3 2.
  • 3 34 it is determined whether or not the process has been executed for all procedures 3 among the plurality of procedures (3 3 4). If processing for all procedures is not completed (3 3 4; N 0), proceed to 3 3 5 and add 1 to 3. When the processing for all the procedures is completed, the processing ends.
  • words in the common word group are synonymous with each other based on whether or not the words to be processed in the same form are used in the same form. Determine if it is different.
  • the process shown in FIG. 5 is an example of a process for determining whether or not the same form is used, and the process is not limited to this and may be any process that can determine whether or not the same form is used. ⁇ 02020/175662 18 ⁇ (: 171?2020/008190
  • the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
  • the present invention is not limited to the above embodiment.
  • the dictionary creation device 10 and the form processing device 30 may be configured as one device.
  • the dictionary creation device 10 is not limited to one computer, and may be composed of multiple computers. Explanation of symbols

Abstract

The objective of the present invention is to create a dictionary for determining whether words in a plurality of item names used in a plurality of forms are synonyms or are words having different meanings. A dictionary creating device 10 for creating at least one of a synonym dictionary and a dictionary of words that are different from one another, in item names in forms, is provided with: an item name acquiring unit 21A for acquiring a plurality of item names mentioned in a plurality of forms; a first processing unit 21B for classifying, on the basis of prescribed criteria, one or a plurality of words included in each of the plurality of item names acquired by the item name acquiring unit 21A, and for creating one or a plurality of common word groups; and a second processing unit 21C for determining, on the basis of information identifying a form, whether the words in each common word group have the same meaning or different meanings from one another, for each common word group.

Description

\¥02020/175662 1 ?<:17 2020/008190 \¥02020/175662 1 ?<: 17 2020/008190
明 細 書 Specification
発明の名称 : Title of invention:
辞書作成装置、 辞書作成方法及び辞書作成プログラム Dictionary creating device, dictionary creating method, and dictionary creating program
技術分野 Technical field
[0001 ] 本発明は、 辞書作成装置、 辞書作成方法及び辞書作成プログラムに関し、 特に、 帳票において用いられている項目名内の単語についての同義語辞書及 び/又は異義語辞書を作成する辞書作成装置、 辞書作成方法及び辞書作成プ ログラムに関する。 The present invention relates to a dictionary creating device, a dictionary creating method, and a dictionary creating program, and in particular, creating a dictionary that creates a synonym dictionary and/or a synonym dictionary for a word in an item name used in a form. The present invention relates to a device, a dictionary creating method, and a dictionary creating program.
背景技術 Background technology
[0002] 自治体や企業等では多数の帳票が利用されている。 帳票は紙媒体であるこ とが一般的ではあるが、 紙媒体の帳票を電子化した入カフォームを用いるこ とで帳票の管理コストを低減することが望まれている。 [0002] Many forms are used in local governments and companies. Generally, the form is a paper medium, but it is desired to reduce the management cost of the form by using an input form that is an electronic form of the paper medium.
[0003] 例えば、 下記の特許文献 1 においては、 帳票の種類を判別し、 帳票の種類 に応じた入カフォームを利用して帳票の受付処理をするシステムについて開 示されている。 [0003] For example, Patent Document 1 below discloses a system that determines the type of a form and uses the input form according to the type of the form to perform the acceptance processing of the form.
先行技術文献 Prior art documents
特許文献 Patent literature
[0004] 特許文献 1 :特開 2 0 0 4 _ 1 2 6 9 1 0号公報 Patent Document 1: Japanese Patent Laid-Open No. 20 0 4 _ 1 2 6 9 10
発明の概要 Summary of the invention
発明が解決しようとする課題 Problems to be Solved by the Invention
[0005] しかしながら、 同じ種類の帳票であっても、 自治体や企業等に応じて対応 する項目の名称 (項目名) が異なっていることがある。 そのため、 数多くの 種類の帳票について項目名を標準化しようとするときには、 項目名のリスト が膨大なものとなり、 人手で整理すると労力が極めて大きいという課題があ った。 そこで、 複数の帳票において同じ意味として用いられている項目名に 対する標準的な項目名を設定することが望まれているが、 さらに、 項目名の 標準化の精度を向上させるため、 項目名に含まれている単語が互いに同義語 \¥02020/175662 2 卩(:171?2020/008190 However, even with the same type of form, the corresponding item names (item names) may differ depending on the local government or company. Therefore, when trying to standardize item names for many types of forms, there was a problem that the list of item names would be huge and it would be extremely labor intensive to organize them manually. Therefore, it is desirable to set a standard item name for the item name that is used as the same meaning in multiple forms, but to further improve the standardization accuracy of the item name, it is included in the item name. The words are synonymous with each other \¥02020/175662 2 卩 (: 171?2020/008190
であるか、 異義語であるかを判定できることが望まれている。 It is desirable to be able to determine whether or not it is a synonym.
[0006] 本発明は、 上記の課題に鑑みてなされたものであり、 その目的は、 複数の 帳票において用いられている複数の項目名内の単語が互いに同義語であるか 、 異義語であるかを判定するための同義語辞書、 異義語辞書を作成する辞書 作成装置、 辞書作成方法及び辞書作成プログラムを提供することにある。 課題を解決するための手段 [0006] The present invention has been made in view of the above problems, and an object thereof is whether words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Another object of the present invention is to provide a synonym dictionary for determining whether or not a dictionary, a dictionary creation device for creating a synonym dictionary, a dictionary creation method, and a dictionary creation program. Means for solving the problem
[0007] 上記課題は、 本発明に係る辞書作成装置によれば、 帳票の項目名の同義語 辞書及び異義語辞書の少なくとも一方を作成する辞書作成装置であって、 複 数の帳票に記載された複数の項目名を取得する項目名取得部と、 前記項目名 取得部により取得した複数の項目名のそれぞれに含まれる一又は複数の単語 を、 所定の条件に基づいて分類し、 _又は複数の共通単語グループを作成す る第 1処理部と、 前記帳票を特定する情報に基づいて、 前記共通単語グルー プ内の単語が互いに同義であるか異義であるかを前記共通単語グループごと に判別する第 2処理部と、 を備えること、 により解決される。 According to the dictionary creating device of the present invention, the above problem is a dictionary creating device for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form, which is described in a plurality of forms. And an item name acquisition unit that acquires a plurality of item names, and one or more words included in each of the plurality of item names acquired by the item name acquisition unit are classified based on a predetermined condition. For each common word group, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the first processing unit that creates the common word group and the information that identifies the form. And a second processing unit that does.
上記構成により、 複数の帳票において用いられている複数の項目名内の単 語が互いに同義語であるか、 異義語であるかを判定することができ、 判定結 果に基づいて同義語辞書、 異義語辞書を作成することができる。 With the above configuration, it is possible to determine whether the words in the multiple item names used in multiple forms are synonyms or synonyms, and based on the determination results, the synonym dictionary, A synonym dictionary can be created.
[0008] 上記の辞書作成装置において、 前記第 1処理部は、 複数の前記項目名間で 共通する単語を含む項目名の前記共通の単語以外の単語を、 同一の共通単語 グループに分類するとよい。 [0008] In the above dictionary creation device, the first processing unit may classify words other than the common word of item names including words common to a plurality of the item names into the same common word group. ..
[0009] 上記の辞書作成装置において、 前記第 2処理部は、 一の前記共通単語グル —プ内の各単語が同一の前記帳票で使用されていない場合に、 前記単語同士 を同義語と判定するとよい。 In the above dictionary creating device, the second processing unit determines that the words are synonymous when the words in one common word group are not used in the same form. Good to do.
一般的に、 同一帳票において、 同一の項目名が何度も出現することはほと んどないため、 共通単語以外、 すなわち共通単語と対で用いられている単語 が同一の帳票で使用されていない場合、 同義語と判定することができる。 こ うすることで、 共通単語グループ内の共通単語以外の単語、 すなわちと同義 語辞書を作成することができる。 \¥02020/175662 3 卩(:171?2020/008190 In general, the same item name rarely appears multiple times in the same form, so words other than common words, that is, words used in pairs with common words, are used in the same form. If not, it can be determined as a synonym. By doing this, a word other than the common words in the common word group, that is, a synonym dictionary can be created. \\02020/175662 3 卩(: 171?2020/008190
また、 これら一連の処理を機械学習の学習モデルとして学習させることで 、 より自動化された効率的な辞書生成機能を構築することができる。 By learning these series of processes as a learning model for machine learning, a more automated and efficient dictionary generation function can be constructed.
[0010] 上記の辞書作成装置において、 前記項目名取得部は、 取得した前記項目名 が記載されていた帳票を特定する帳票識別情報を前記項目名ごと取得し、 前 記共通単語グループは、 共通単語グループ記憶部に記憶され、 該共通単語グ ループに属する単語と、 該単語ごとに該単語が記載されていた帳票の帳票識 別情報とを有し、 前記第 2処理部は、 処理対象の単語同士が互いに共通する 帳票識別情報を有していない場合に、 前記単語同士を同義語と判別するとよ い。 [0010] In the above dictionary creation device, the item name acquisition unit acquires, for each item name, form identification information that identifies a form in which the acquired item name is described, and the common word group is common. The word group storage unit has a word belonging to the common word group and form identification information of a form in which the word is described for each word, and the second processing unit is a processing target. When words do not have common form identification information, it is preferable to distinguish the words from each other as synonyms.
こうすることで、 共通単語グループに分類された共通単語と対で用いられ ている単語の同義語辞書を作成することができる。 By doing this, it is possible to create a synonym dictionary of words used in pairs with common words classified into common word groups.
[001 1 ] 上記の辞書作成装置において、 前記第 2処理部は、 処理対象の単語同士が 互いに共通する帳票識別情報を有している場合に、 前記単語同士を異義語と 判別する [001 1] In the dictionary creation device, the second processing unit determines the words to be synonyms when the words to be processed have common form identification information.
こうすることで、 共通単語グループに分類されたと対で用いられている単 語の異義語辞書を作成することができる。 By doing this, it is possible to create a synonym dictionary for a single word that is used as a pair when classified into a common word group.
[0012] 上記課題は、 本発明に係る辞書作成方法によれば、 同義語辞書及び異義語 辞書の少なくとも一方を作成するための辞書作成装置による辞書作成方法で あって、 前記辞書作成装置が、 複数の帳票に記載された複数の項目名を取得 する項目名取得工程と、 前記項目名取得工程で取得した複数の項目名のそれ それに含まれる一又は複数の単語を、 所定の条件に基づいて分類し、 一又は 複数の共通単語グループを作成する第 1処理工程と、 前記帳票を特定する情 報に基づいて、 前記共通単語グループ内の単語が互いに同義であるか異義で あるかを前記共通単語グループごとに判別する第 2処理工程と、 を備えるこ と、 により解決される。 According to the dictionary creating method of the present invention, the above problem is a dictionary creating method by a dictionary creating apparatus for creating at least one of a synonym dictionary and a synonym dictionary, wherein the dictionary creating apparatus comprises: An item name acquisition step of acquiring a plurality of item names described in a plurality of forms, and one or a plurality of words included in each of the plurality of item names acquired in the item name acquisition step based on a predetermined condition. Based on the first processing step of classifying and creating one or more common word groups, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the information identifying the form. The second processing step of discriminating for each word group is provided to solve the above problem.
こうすることで、 複数の帳票において用いられている複数の項目名内の単 語が互いに同義語であるか、 異義語であるかを判定することができ、 判定結 果に基づいて同義語辞書、 異義語辞書を作成することができる。 \¥02020/175662 4 卩(:171?2020/008190 By doing this, it is possible to determine whether the words in the multiple item names used in multiple forms are synonyms or synonyms, and the synonym dictionary is based on the determination results. , You can create a synonym dictionary. \¥02020/175662 4 卩 (: 171?2020/008190
[0013] 上記課題は、 本発明に係る辞書作成プログラムによれば、 帳票の項目名の 同義語辞書及び異義語辞書の少なくとも一方を作成する辞書作成プログラム であって、 コンピュータを、 複数の帳票に記載された複数の項目名を取得す る項目名取得部と、 前記項目名取得部により取得した複数の項目名のそれぞ れに含まれる一又は複数の単語を、 所定の条件に基づいて分類し、 一又は複 数の共通単語グループを作成する第 1処理部と、 前記帳票を特定する情報に 基づいて、 前記共通単語グループ内の単語が互いに同義であるか異義である かを前記共通単語グループごとに判別する第 2処理部として機能させること 、 により解決される。 [0013] According to the dictionary creation program of the present invention, the above-mentioned problem is a dictionary creation program for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form. An item name acquisition unit that acquires a plurality of listed item names and one or more words contained in each of the plurality of item names acquired by the item name acquisition unit are classified based on predetermined conditions. However, the first processing unit that creates one or more common word groups and whether the words in the common word groups are synonymous or synonymous with each other is determined based on the information that identifies the form. It is solved by making it function as the second processing unit that determines for each group.
こうすることで、 複数の帳票において用いられている複数の項目名内の単 語が互いに同義語であるか、 異義語であるかを判定することができ、 判定結 果に基づいて同義語辞書、 異義語辞書を作成することができる。 By doing this, it is possible to determine whether the words in the multiple item names used in multiple forms are synonyms or synonyms, and the synonym dictionary is based on the determination results. , You can create a synonym dictionary.
発明の効果 Effect of the invention
[0014] 本発明によれば、 複数の帳票において用いられている複数の項目名内の単 語が互いに同義語であるか、 異義語であるかを判定するための同義語辞書、 異義語辞書を作成することができる。 [0014] According to the present invention, a synonym dictionary and a synonym dictionary for determining whether the words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Can be created.
また、 これら一連の処理を機械学習の学習モデルとして学習させることで 、 より自動化された効率的な辞書生成機能を構築することができる。 By learning these series of processes as a learning model for machine learning, a more automated and efficient dictionary generation function can be constructed.
図面の簡単な説明 Brief description of the drawings
[0015] [図 1]情報処理システムの全体構成を示す図である。 [0015] [FIG. 1] FIG. 1 is a diagram showing an overall configuration of an information processing system.
[図 2]同 ·異義語辞書作成処理の概要を説明する図である。 [Fig. 2] Fig. 2 is a diagram for explaining the outline of the synonym dictionary creation process.
[図 3]辞書作成装置の機能ブロック図である。 FIG. 3 is a functional block diagram of the dictionary creation device.
[図 4]辞書作成処理のフロー図である。 [FIG. 4] A flow chart of dictionary creation processing.
[図 5]辞書作成処理のフロー図である。 [FIG. 5] A flow chart of dictionary creation processing.
発明を実施するための形態 MODE FOR CARRYING OUT THE INVENTION
[0016] 以下、 図 1乃至図 5を参照しながら、 本発明の実施の形態 (以下、 本実施 形態) に係る辞書作成装置 1 〇について説明する。 [0016] Hereinafter, a dictionary creation device 10 according to an embodiment of the present invention (hereinafter, referred to as the present embodiment) will be described with reference to FIGS. 1 to 5.
なお、 以下に説明する実施形態は、 本発明の理解を容易にするための一例 \¥02020/175662 5 卩(:171?2020/008190 The embodiments described below are examples for facilitating the understanding of the present invention. \¥02020/175662 5 卩 (: 171?2020/008190
に過ぎず、 本発明を限定するものではない。 すなわち、 以下に説明するシス テムの構成、 データ、 処理等については、 本発明の趣旨を逸脱することなく 、 変更、 改良され得るとともに、 本発明にはその等価物が含まれる。 However, the present invention is not limited thereto. That is, the configuration, data, processing, etc. of the system described below can be modified and improved without departing from the spirit of the present invention, and the present invention includes equivalents thereof.
[0017] 以下で用いられる用語の説明を下記に示す。 [0017] The terms used below are explained below.
「帳票」 とは、 情報の入力が可能であり、 所定の処理 (手続き) に供され る紙媒体又は電子媒体をいう。 例えば、 市町村等の自治体、 国、 民間企業等 をあて先として申請を行うために用いるものが 「帳票」 に相当する。 具体的 には、 出生届、 妊娠届等が 「帳票」 の一例に相当する。 The “form” is a paper medium or electronic medium that can be used to enter information and that is subjected to a prescribed process (procedure). For example, a “form” is used to apply for a local government such as a municipality, a country, or a private company. Specifically, birth notification, pregnancy notification, etc. correspond to an example of a “form”.
「項目名」 とは、 帳票の構成要素であり、 帳票への入力情報の内容や形式 を規定するための情報である。 例えば、 「子どもの氏名」 、 「子どもの生年 月日」 等が上記の 「項目名」 の一例に相当する。 The “item name” is a component of the form and is information for defining the content and format of the input information to the form. For example, "child's name", "child's date of birth", etc. correspond to an example of the above "item name".
「同義語」 とは、 異なる 2以上の語が互いに同じ意味を有している場合、 特に、 帳票の項目において同じ属性を示す語として用いられる場合、 これら の語を同義語という。 “Synonyms” are synonyms when two or more different words have the same meaning as each other, especially when they are used as words showing the same attribute in a form item.
「異義語」 とは、 異なる 2以上の語が互いに異なる意味を有している場合 、 特に、 帳票の項目において異なる属性を示す語として用いられる場合、 こ れらの語を異義語という。 “Synonyms” are synonyms when two or more different words have different meanings, especially when they are used as words that indicate different attributes in a form item.
「同義語辞書」 とは、 2以上の語が互いに同義語であることを判定可能な 情報を有するデータの集合体である。 例えば、 「子ども」 と 「児童」 、 「氏 名」 と 「名前」 がそれぞれ同義語である場合、 同義語辞書を参照することに より、 これらの語が同義関係にあることが判定可能である。 A “synonym dictionary” is a collection of data that has information that allows determining that two or more words are synonyms for each other. For example, if “child” and “child” are synonymous with “name” and “name”, it is possible to determine that these words have a synonymous relationship by referring to the synonym dictionary. ..
「異義語辞書」 とは、 2以上の語が互いに異義語であることを判定可能な 情報を有するデータの集合体である。 例えば、 「子ども」 と 「母親」 、 「氏 名」 と 「生年月日」 がそれぞれ異義語である場合、 異義語辞書を参照するこ とにより、 これらの語が異義関係にあることが判定可能である。 A "synonym dictionary" is a collection of data that has information that enables two or more words to be synonymous with each other. For example, if “child” and “mother” are synonyms, and “name” and “date of birth” are synonyms, it is possible to determine that these terms are synonymous by referring to the synonym dictionary. Is.
なお、 以下においては、 「同義語」 と 「異義語」 を合わせて 「同 異義語 」 ともいい、 「同義語辞書」 と 「異義語辞書」 を合わせて 「同 ·異義語辞書 」 ともいう。 「同 ·異義語辞書」 とは、 上記した 「同義語辞書」 のデータ集 \¥02020/175662 6 卩(:171?2020/008190 In the following, “synonyms” and “synonyms” are collectively referred to as “synonyms”, and “synonyms dictionary” and “synonyms dictionary” are also referred to as “synonyms”. "Synonym dictionary" is a data collection of the above "synonym dictionary". \¥02020/175662 6 卩 (: 171?2020/008190
合体と 「異義語辞書」 のデータの集合体の別々の集合体の双方を意味するか 、 または、 1つのデータの集合体に同義関係と異義関係とが判定可能な情報 を有する集合体を意味し、 いずれも含むものとする。 It means both a union and a separate set of data in a "synonym dictionary", or a set that has information that allows a synonymous relationship and a heteronymous relationship to be determined in one data set. However, both shall be included.
[0018] [情報処理システム 1の構成] [0018] [Configuration of information processing system 1]
図 1 に示されるように、 情報処理システム 1は、 同 ·異義語辞書作成装置 1 0 (以下、 「辞書作成装置 1 〇」 という。 ) 及び帳票処理装置 3 0を備え る。 辞書作成装置 1 〇と帳票処理装置 3 0とは、 例えば図示しないインター ネッ トやイントラネッ ト等のネッ トワークを介して通信可能に接続される。 As shown in FIG. 1, the information processing system 1 includes a synonym dictionary creating device 10 (hereinafter referred to as “dictionary creating device 10 ”) and a form processing device 30. The dictionary creating device 10 and the form processing device 30 are communicably connected via a network such as an internet or an intranet (not shown).
[0019] 帳票処理装置 3 0はスキャナ 4 0に接続される。 The form processing device 30 is connected to the scanner 40.
スキャナ 4 0は、 紙媒体を光学走査することにより画像情報を取り込む装 置である。 本実施形態では、 スキャナ 4 0は、 帳票 をスキャンしたスキャ ン画像 (画像情報) を、 帳票処理装置 3 0に出力する。 The scanner 40 is a device that captures image information by optically scanning a paper medium. In the present embodiment, the scanner 40 outputs a scan image (image information) obtained by scanning the form to the form processing device 30.
帳票?は、 帳簿、 伝票、 申請書等の定型的な書類である。 本実施形態では 、 多種類の帳票 をスキャナ 4 0により取り込み、 帳票処理装置 3 0に出力 することとする。 なお、 標準項目名を設定する際の処理対象とする複数の帳 票 は、 それぞれ同一手続きの帳票 とする。 具体的には、 例えば出生届と いう手続きについて各種自治体で用いられているそれそれの帳票 を処理対 象とする。 Report? Are standard documents such as books, slips, and application forms. In the present embodiment, it is assumed that various types of forms are captured by the scanner 40 and output to the form processing device 30. In addition, multiple forms to be processed when setting the standard item name should be the same procedures. Specifically, for example, the forms used for various local governments for the procedure of birth registration are processed.
[0020] 帳票処理装置 3 0は、 スキャナ 4 0により取り込んだ帳票 を処理するコ ンピユータである。 具体的には、 帳票処理装置 3 0は、 帳票 に対して〇〇 8 (光学文字認識) を実行して、 帳票 に記載の文字列を取得する。 また、 帳票処理装置 3 0は、 爵線、 文字列の配置に基づいて、 帳票?の表構造を解 析する。 より具体的には、 帳票処理装置 3 0は、 帳票 を構成する項目欄、 入力欄、 穴埋め入力欄に分けるとともに、 項目欄 (さらには穴埋め入力欄) に記載された項目名の情報を解析する。 The form processing device 30 is a computer that processes the form captured by the scanner 40. Specifically, the form processing device 30 executes XX 8 (optical character recognition) on the form to obtain the character string described in the form. In addition, the form processing device 30 determines whether or not the form? The table structure of is analyzed. More specifically, the form processing apparatus 30 divides the form into item columns, input columns, and fill-in input columns, and analyzes information on item names described in the item column (and fill-in input column). ..
なお、 項目欄とは、 項目名としての文字列が記載された領域であり、 入力 欄とは、 文字列が記載されず、 項目欄に対応する情報を入力する領域である 。 そして、 穴埋め入力欄とは、 文字列が記載され、 文字列の間に情報を入力 する領域である。 The item column is an area in which a character string as an item name is written, and the input column is an area in which no character string is written and the information corresponding to the item column is input. And in the blank entry field, a character string is described, and information is entered between the character strings. This is the area to
[0021] 帳票処理装置 30には、 入カデバイス 3 1が接続されており、 入カデバイ ス 3 1 を介して情報の入力が可能である。 また、 帳票処理装置 30には、 表 示デバイス 32が接続されており、 U 丨画面等が表示デバイス 32に表示可 能である。 An input device 3 1 is connected to the form processing device 30 and information can be input via the input device 3 1. In addition, a display device 32 is connected to the form processing device 30, and a U screen or the like can be displayed on the display device 32.
[0022] 本実施形態では、 帳票処理装置 30が解析した複数種類の帳票 Pの情報を 辞書作成装置 1 0に出力する。 そして、 辞書作成装置 1 0が複数種類の帳票 Pにおいて用いられている項目名内の単語が互いに同義語であるか、 異義語 であるかを判定するための同義語辞書、 異義語辞書を作成する。 In the present embodiment, information on a plurality of types of forms P analyzed by the form processing device 30 is output to the dictionary creating device 10. Then, the dictionary creating device 10 creates a synonym dictionary and a synonym dictionary for determining whether the words in the item names used in multiple types of forms P are synonyms or synonyms. To do.
[0023] 次に、 辞書作成装置 1 0の構成について説明する。 Next, the configuration of the dictionary creating device 10 will be described.
図 1 に示されるように、 辞書作成装置 1 0は、 ハードウェアとしてプロセ ッサ 1 1、 記憶装置 1 2及び通信用インターフェース 1 3を備えるコンピユ _夕である。 As shown in FIG. 1, the dictionary creating device 10 is a computer that includes a processor 11, a storage device 12 and a communication interface 13 as hardware.
[0024] プロセッサ 1 1は、 例えば中央処理装置 (Ce n t r a l P r o c e s s i n g U n i t ) を含み構成され、 記憶装置 1 2に記憶されるプログラ ムやデータに基づいて各種の演算処理を実行するとともに、 辞書作成装置 1 0の各部を制御する。 [0024] The processor 11 is configured to include, for example, a central processing unit (Central Processing Unit), and executes various arithmetic processes based on the programs and data stored in the storage device 12, and the dictionary. Controls each part of the creation device 10.
[0025] 記憶装置 1 2は、 例えばメモリ、 磁気ディスク装置を含み構成され、 各種 のプログラムやデータを記憶するほか、 プロセッサ 1 1のワークメモリとし ても機能する。 The storage device 12 is configured to include, for example, a memory and a magnetic disk device, stores various programs and data, and also functions as a work memory for the processor 11.
[0026] 通信用インターフェースは、 ネッ トワークインターフェースカード (N I C ) 等の通信インターフェースを有し、 通信インターフェースを介してネッ トワークに接続する。 そして、 通信用インターフェースは、 ネッ トワークを 介して帳票処理装置 30等のデバイスと通信する。 The communication interface has a communication interface such as a network interface card (N IC) and is connected to the network via the communication interface. Then, the communication interface communicates with a device such as the form processing device 30 via the network.
[0027] [辞書作成装置 1 0により実行される処理の概要] [Outline of Processing Executed by Dictionary Creation Device 10]
ここで、 図 2を参照しながら、 辞書作成装置 1 0により実行される処理の 概要について説明する。 Here, an outline of the processing executed by the dictionary creating device 10 will be described with reference to FIG.
[0028] 図 2に示されるように、 辞書作成装置 1 0は、 各種手続きに関する複数の \¥02020/175662 8 卩(:171?2020/008190 [0028] As shown in Fig. 2, the dictionary creation device 10 is configured to manage a plurality of procedures related to various procedures. \¥02020/175662 8 卩 (: 171?2020/008190
帳票 からなる帳票群 〇を取得する。 複数の帳票 は、 複数の自治体で用 いられる同 _手続きに関する帳票を含む。 同 _手続きであっても、 自治体が 異なると、 帳票の形式や使用されている項目名が異なるため、 それぞれの帳 票を帳票群 ◦に含んでいる。 ここで、 各帳票 には、 項目名が 「八」 、 「 巳」 、 「0」 等の 1以上の項目名 丨が含まれている。 項目名 丨 は、 1以上の 単語を含む語句である。 さらに、 各項目名 丨 は、 帳票を識別可能な帳票丨 口 を含む。 Acquire form group 〇 consisting of forms. Multiple document contains a form on the same _ procedure that is needed use in more than one municipality. Even in the same _ procedure, when the municipality is different, because the item name, which is the form and use of the form is different, contains each of the book form to the form group ◦. Here, each form includes one or more item names such as “8”, “Mimi”, and “0”. Item name 丨 is a phrase that contains one or more words. Furthermore, each item name includes a form account that can identify the form.
[0029] そして、 辞書作成装置 1 0は、 各帳票 から項目名 丨 を抽出する。 このと き、 いずれの手続きに属するいずれの帳票から抽出した項目名であるかを判 別可能なように、 手続き 丨 口、 帳票 I 口等の識別情報である、 手続き識別情 報、 帳票識別情報を項目名と合せて取得する。 ここで、 帳票群 〇に含まれ る帳票 から抽出した項目名 丨の全体を項目名群丨 ◦とする。 [0029] Then, the dictionary creating device 10 extracts the item name item from each form. In this case, the procedure identification information, the form identification information, which is the identification information of the procedure mouth, the form I mouth, etc., can be identified so that the item name extracted from which form belonging to which procedure can be identified. Is acquired together with the item name. Here, the entire item name 丨 extracted from the forms included in the form group 〇 is referred to as an item name group 丨.
[0030] 次に、 辞書作成装置 1 0は、 項目名群丨 ◦に含まれる項目名 I を、 共通単 語グループに分類する (第 1処理:共通単語グループ作成処理) 。 Next, the dictionary creating device 10 classifies the item name I included in the item name group ◦ into a common word group (first process: common word group creating process).
第 1処理では、 辞書作成装置 1 0は、 処理対象の一手続き (手続き ) を 取得し、 項目名群丨 ◦に含まれる項目名 Iのうち、 手続き に属する項目名 I について、 項目名内に共通の単語 (名詞) を有する項目名 丨 同士を共通の グループにまとめる。 In the first process, the dictionary creation device 10 acquires one procedure (procedure) to be processed, and in the item name I included in the item name group ◦, the item name I belonging to the procedure is included in the item name. Group item names with common words (nouns) into common groups.
[0031 ] 具体的には、 辞書作成装置 1 0は、 形態素解析により項目名 丨 1 , 丨 2を それぞれ分解した語 (形態素) のうちから名詞を抽出する。 なお、 以下、 形 態素解析により抽出された名詞のことを 「単語」 という。 Specifically, the dictionary creating device 10 extracts a noun from the words (morphemes) obtained by decomposing the item names 丨1 and 丨2 by morphological analysis. In the following, nouns extracted by morphological analysis are called "words".
そして、 2つの項目名 丨 1 , 丨 2が互いに共通する単語、 すなわち同一の 単語を含んでいる場合、 共通する単語と対で用いられている単語を集めたグ ループ (共通単語グループ) を作成する。 Then, when the two item names 丨1 and 丨2 contain a common word, that is, the same word, a group (common word group) that collects words used in pairs with the common word is created. To do.
[0032] 例えば、 項目名 丨 1が 「子どもの氏名」 、 項目名 丨 2が 「母親の氏名」 で ある場合、 互いに共通する単語は 「氏名」 であるので、 共通単語グループ 「 “氏名” グループ」 を作成し、 グループ内のメンバとして、 項目名 丨 1及び 項目名 丨 2内で 「氏名」 と対で用いられている単語 (名詞) である 「子ども \¥02020/175662 9 卩(:171?2020/008190 [0032] For example, when the item name 丨1 is "child's name" and the item name 丨2 is "mother's name", the common words are "name". ”Is created, and as a member of the group, the word (noun), which is a word (noun) paired with “name” in item name 丨 1 and item name 伨 2, \¥02020/175662 9 卩 (: 171?2020/008190
」 及び 「母親」 をそれぞれ “氏名” グループに分類する。 また、 共通単語グ ループは、 各単語に対応する帳票丨 口を含む。 ,” and “mother” are classified into “name” groups. Also, the common word group includes a form entry corresponding to each word.
辞書作成装置 1 〇は、 処理対象の手続き に属する項目名 Iの全てに対し て第 1処理を行い、 手続き八の項目名の共通単語グループを作成する。 そし てこの処理を手続きごとに繰り返し、 全手続きに対して共通単語グループを 作成する。 The dictionary creating device 10 carries out the first process for all item names I belonging to the procedure to be processed, and creates a common word group for the item names of procedure 8. Then, this process is repeated for each procedure to create a common word group for all procedures.
[0033] なお、 処理対象の手続き八は、 ユーザへ入力させ、 その入力から取得する ことができる。 または、 項目名群丨 ◦の手続き I 口等から辞書作成装置 1 0 が処理対象の手続き のみを抽出して処理しても良い。 The procedure 8 to be processed can be input by the user and acquired from the input. Alternatively, the dictionary creation device 10 may extract and process only the procedure to be processed from the procedure I or the like of the item name group.
上記処理においては、 手続きごとに共通単語グループを作成するが、 手続 きをまたいで共通単語グループを作成したい場合には、 項目名群丨 ◦に含ま れる全項目名に対して処理を行うようにすることもできる。 In the above process, a common word group is created for each procedure, but if you want to create a common word group across procedures, process all item names included in the item name group ◦. You can also do it.
[0034] 次に、 辞書作成装置 1 0は、 第 1処理で作成された共通単語グループのそ れそれに対し、 グループ内の各単語同士が同義語の可能性が高いか、 異義語 の可能性が高いか、 を判別し、 同義語候補、 異義語候補を作成する (第 2処 理;同 ·異義語候補作成処理) 。 [0034] Next, the dictionary creation device 10 determines whether the words in the group are likely to be synonyms or synonyms for each of the common word groups created in the first process. Is determined to be high, and synonym candidates and synonym candidates are created (second process; synonym candidate creation process).
[0035] 具体的には、 辞書作成装置 1 0は、 処理対象の単語同士が同じ帳票内で使 用されているか否かを帳票丨 口を用いて判別する。 辞書作成装置 1 0は、 単 語同士が同じ帳票内で使用されている場合には、 互いが 「異義語」 である可 能性が高いと判別し、 異義語の候補として同 ·異義語候補記憶部を更新する 。 一方、 単語同士が同じ帳票内で使用されていない場合には、 互いが 「同義 語」 である可能性が高いと判別し、 同義語の候補として同 ·異義語候補記憶 部を更新する。 [0035] Specifically, the dictionary creating apparatus 10 determines whether or not words to be processed are used in the same form using a form entry. When the words are used in the same form, the dictionary creation device 10 determines that there is a high possibility that they are “synonyms”, and the synonym candidates are candidates for synonyms. Update the memory. On the other hand, when words are not used in the same form, it is determined that they are likely to be “synonyms”, and the synonym candidate storage unit is updated as a synonym candidate.
[0036] 例えば、 “氏名” グループに分類された単語とその帳票丨 口として、 「子 ども ;帳票 1」 、 「児童;帳票 2」 というメンバが分類されている場合には 、 帳票丨 口が異なるため、 同じ帳票で使われていない、 と判別し、 よって 「 子ども」 と 「児童」 は 「同義語」 候補として更新される。 [0036] For example, when the words "children; form 1" and "children; form 2" are classified as words and their form mouths classified in the "name" group, the form mouths are Since they are different, it is determined that they are not used in the same form, and thus “child” and “child” are updated as “synonyms” candidates.
[0037] 上記第 2処理を全てのグループに対して実行し、 また全ての手続きに対し \¥02020/175662 10 卩(:171?2020/008190 [0037] The above second process is executed for all groups, and for all procedures. \¥02020/175662 10 ((171?2020/008190
て実行することで、 同義語候補、 異義語候補を作成する。 By executing the above, synonym candidates and synonym candidates are created.
[0038] 辞書作成装置 1 0は、 第 2処理で作成された同 ·異義語辞書候補をユーザ に提示し、 承認入力を受け付ける。 具体的には、 辞書作成装置 1 0は、 辞書 作成装置 1 〇に設けられた表示部、 又は通信回線を介して接続された表示装 置等に同 ·異義語辞書候補の情報を表示させる。 そして、 直接又は通信回線 を介して接続された入力装置からの入力を受け付ける。 [0038] The dictionary creation device 10 presents the synonym dictionary candidates created in the second process to the user and accepts the approval input. Specifically, the dictionary creating device 10 causes the display unit provided in the dictionary creating device 10 or a display device or the like connected via a communication line to display the information of the synonym dictionary candidates. Then, it receives an input from an input device connected directly or via a communication line.
[0039] 辞書作成装置 1 0は、 ユーザからの承認入力を受け付け、 同 ·異義語候補 から、 各候補に対して承認又は却下の情報を反映し、 最終的な同 ·異義語辞 書の作成、 更新を行う (同 ·異義語辞書更新処理) 。 [0039] The dictionary creation device 10 accepts approval input from the user, reflects approval/rejection information for each candidate from the synonym candidate, and creates a final synonym dictionary. , Update (Synonym dictionary update process).
[0040] なお、 上記実施形態においては、 同 ·異義語候補を作成し、 候補に対する 承認可否を受け付けて、 最終的な同 ·異義語辞書を確定したが、 これに限ら れず、 第 2処理で作成した同 ·異義語候補をそのまま同 ·異義語辞書として 確定してもよい。 [0040] In the above embodiment, the synonym candidate is created, and the approval/disapproval of the candidate is accepted to determine the final synonym dictionary. However, the present invention is not limited to this. The synonym candidates created may be fixed as the synonym dictionary as they are.
[0041 ] このように、 辞書作成装置 1 0は、 手続き に属する複数の帳票から取得 した項目名 丨 に対し、 同義語であるか、 異義語であるかを判別し、 同 ·異義 語辞書を作成する。 なお、 作成された同 ·異義語辞書は、 同 _手続きにおい て複数の自治体等が使用している異なる帳票の異なる項目名を共通化、 標準 化する際に活用できるものである。 [0041] In this way, the dictionary creation device 10 determines whether the item name 丨 acquired from a plurality of forms belonging to a procedure is a synonym or a synonym, and creates a synonym dictionary. create. It should be noted that the same-objection language dictionary that is created is one that can take advantage of the different item names of different form in which a plurality of local governments Te same _ procedure smell is using common, at the time of standardization.
さらに、 これら一連の処理を機械学習の学習モデルとして学習させること もできる。 このように学習させることにより、 より自動化された効率的な辞 書生成機能を構築することが可能となる。 Further, the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
[0042] [辞書作成装置 1 0に備えられる機能] [Functions provided in the dictionary creating device 10]
以下においては、 以上説明した処理を実現するために辞書作成装置 1 0に 備えられる機能について説明する。 In the following, the functions provided in the dictionary creating device 10 in order to realize the processing described above will be described.
[0043] 図 3には、 辞書作成装置 1 0の機能ブロック図を示した。 図 3に示される ように、 辞書作成装置 1 〇は、 機能として、 項目名記憶部 2〇 、 共通単語 グループ記憶部 2 0巳、 同 ·異義語候補記憶部 2 0 (3、 同 ·異義語辞書記憶 部 2 0 0、 項目名取得部 2 1 、 第 1処理部 2 1 巳、 第 2処理部 2 1 (3、 提 \¥02020/175662 11 卩(:171?2020/008190 FIG. 3 shows a functional block diagram of the dictionary creating device 10. As shown in Fig. 3, the dictionary creation device 10 has, as functions, an item name storage unit 20, a common word group storage unit 20, and a synonym candidate storage unit 20 ( 3, a synonym). Dictionary storage unit 200, item name acquisition unit 21, first processing unit 21, second processing unit 2 1 ( 3, proposed \¥02020/175662 11 卩(: 171?2020/008190
示部 2 1 0、 受付部 2 1 巳、 更新部 2 1 を備える。 The display section 210, the reception section 21 and the update section 21 are provided.
[0044] 辞書作成装置 1 0に備えられる上記の各部の機能は、 記憶装置 1 2に記憶 されるプログラム (辞書作成プログラム) に従ってプロセッサ 1 1が辞書作 成装置 1 0の各部を動作させることにより実行される。 なお、 上記のプログ ラムは、 通信用インターフェースによりネッ トワーク等の通信網を介して辞 書作成装置 1 〇が取得してもよいし、 プログラムを記憶した記憶媒体から辞 書作成装置 1 〇が読み込んで取得することとしてもよい。 The functions of the above-mentioned respective units provided in the dictionary creating device 10 are achieved by the processor 11 operating each unit of the dictionary creating device 10 according to a program (dictionary creating program) stored in the storage unit 12. To be executed. The above program may be acquired by the dictionary creation device 10 via a communication network such as a network through a communication interface, or the dictionary creation device 10 reads it from the storage medium storing the program. It may be acquired at.
また、 上記の辞書作成プログラムに従って、 辞書作成装置 1 0のプロセッ サ 1 1が動作することにより本発明に係る辞書作成方法が実現される。 以下、 上記の各部の機能の詳細について説明する。 Further, the processor 11 of the dictionary creating apparatus 10 operates according to the dictionary creating program to implement the dictionary creating method according to the present invention. The details of the functions of the above units will be described below.
[0045] [項目名記憶部 2 0八] [0045] [Item name storage section 20]
項目名記憶部 2 0 は、 辞書作成装置
Figure imgf000013_0001
に含まれる帳票 から抽出した項目名の情報を記憶する。 項目名記憶部 2 0 は、 主に辞書作 成装置 1 0の記憶装置 1 2により実現される。
The item name storage unit 20 is a dictionary creation device.
Figure imgf000013_0001
The information of the item name extracted from the form included in is stored. The item name storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
[0046] 具体的には、 項目名記憶部 2 0 は、 記憶装置 1 2に記憶される項目名テ —ブル (不図示) により実現される。 項目名テーブルは、 一例として、 項目 名、 その項目名が抽出された帳票の帳票識別情報、 その帳票の属する手続き 識別情報を、 項目名ごとに記憶する。 帳票識別情報、 手続き識別情報はそれ それ、 例えば帳票丨 口、 手続き I 口である。 帳票識別情報は、 同一手続きで 用いられる帳票であっても、 その帳票を使用している自治体、 国、 企業等の 使用者が異なるものは、 異なる帳票識別情報が付与されている。 [0046] Specifically, the item name storage unit 20 is realized by an item name table (not shown) stored in the storage device 12. As an example, the item name table stores, for each item name, the item name, the form identification information of the form in which the item name is extracted, and the procedure identification information to which the form belongs. The form identification information and the procedure identification information are, for example, a form account and a procedure I account. Even if the form identification information is a form used in the same procedure, different form identification information is given to different users such as local governments, countries and companies that use the form.
[0047] [共通単語グループ記憶部 2 0巳] [0047] [Common word group storage unit 20]
共通単語グループ記憶部 2 0巳は、 辞書作成装置 1 0が作成した、 一又は 複数の共通単語グループの情報を記憶する。 共通単語グループ記憶部 2 0巳 は、 主に辞書作成装置 1 〇の記憶装置 1 2により実現される。 The common word group storage unit 20 stores information of one or more common word groups created by the dictionary creating device 10. The common word group storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
[0048] 具体的には、 共通単語グループ記憶部 2 0巳は、 記憶装置 1 2に記憶され る共通単語グループテーブル (不図示) により実現される。 共通単語グルー プテーブルは、 一例として、 共通単語名、 単語、 帳票の帳票識別情報を記憶 \¥02020/175662 12 卩(:171?2020/008190 [0048] Specifically, the common word group storage unit 20 is realized by a common word group table (not shown) stored in the storage device 12. The common word group table stores, for example, common word names, words, and form identification information of forms. \¥02020/175662 12 ((171?2020/008190
する。 To do.
共通単語名は、 一の共通単語グループに 1つである。 例えば、 “氏名” グ ループであれば、 共通単語は 「氏名」 である。 A common word name is one in a common word group. For example, in the “name” group, the common word is “name”.
単語は、 その共通単語グループのメンバでとなる単語であり、 例えば、 第 1処理で項目名 「子どもの氏名」 から “氏名” グループに分類する処理が行 われた場合には、 共通単語と対となっていた、 すなわち共通単語と共に項目 名を構成していた 「子ども」 である。 The word is a word that is a member of the common word group. For example, when the process of classifying the item name "child's name" into the "name" group is performed in the first process, the word is paired with the common word. , That is, "children" who formed the item name together with the common word.
帳票識別情報は、 単語ごとに記憶されており、 項目名記憶部 2 0 の帳票 識別情報を同じである。 なお、 1つの単語が複数の帳票で使用されている場 合には、 1つの単語に対して複数の帳票識別情報を記憶している。 The form identification information is stored for each word, and the form identification information of the item name storage unit 20 is the same. If one word is used in multiple forms, multiple form identification information is stored for one word.
[0049] [同 ·異義語候補記憶部 2 0 0 ] [0049] [Synonym candidate storage unit 2 0 0]
同 ·異義語候補記憶部 2 0(3は、 辞書作成装置 1 0が作成した、 同義語の 候補となる語が識別可能な情報、 及び異義語の候補となる語が識別可能な情 報を含むデータ (不図示) を記憶する。 同 ·異義語候補記憶部 2 0(3は、 主 に辞書作成装置 1 〇の記憶装置 1 2により実現される。 同 異義語候補記憶 部 2 0(3は、 一例として、 下記で説明する同 ·異義語辞書記憶部 2 0 0と同 様の内容を記憶する。 Synonym candidate storage unit 20 (3 shows information created by the dictionary creation device 10 that can identify synonym candidate words and information that can identify synonym candidate words. Stores the data (not shown) containing the same. Synonym candidate storage unit 20 (3 is mainly realized by the storage unit 12 of the dictionary creating device 10. The synonym candidate storage unit 20 (3 As an example, stores the same contents as the synonym dictionary storage unit 200 described below.
[0050] [同 ·異義語辞書記憶部 2 0 0 ] [0050] [Synonym dictionary storage section 2 0 0]
具体的には、 同 ·異義語辞書記憶部 2 0 0は、 記憶装置 1 2に記憶される 同 ·異義語辞書テーブル (不図示) により実現される。 同 ·異義語辞書記憶 部 2 0 0は、 辞書作成装置 1 0が作成した、 同義関係にある語が識別可能な 情報を含む同義語辞書のデータ (不図示) 、 及び異義関係にある語が識別可 能な情報を含む異義語辞書のデータ (不図示) を記憶する。 同 ·異義語辞書 記憶部 2 0 0は、 主に辞書作成装置 1 0の記憶装置 1 2により実現される。 Specifically, the synonym dictionary storage unit 200 is realized by the synonym dictionary table (not shown) stored in the storage device 12. The synonym dictionary storage unit 200 stores the synonym dictionary data (not shown) including information that allows the synonymous words created by the dictionary creating device 10 to be identified, and the synonymous words. Stores the synonym dictionary data (not shown) that contains identifiable information. The synonym dictionary storage unit 200 is realized mainly by the storage device 12 of the dictionary creating device 10.
[0051 ] 同 ·異義語辞書記憶部 2 0 0は、 一例として、 単語 1、 単語 2、 単語 1 と 単語 2の同 ·異義、 手続き、 を記憶する。 同 ·異義としては、 単語 1 と単語 2について、 例えば、 「同義」 、 「異義」 、 「手続き内同義」 、 「手続き内 異義」 といった、 判別又は承認結果に応じた内容を記憶している。 \¥02020/175662 13 卩(:171?2020/008190 [0051] The synonym dictionary storage unit 200 stores, for example, synonyms, procedures, and synonyms of word 1, word 2, word 1 and word 2. As synonyms, for word 1 and word 2, for example, “synonyms”, “synonyms”, “intra-procedure synonyms”, “intra-procedure synonyms”, etc. are stored according to the discrimination or approval result. \¥02020/175662 13 卩 (: 171?2020/008190
[0052] [項目名取得部 2 1 八] [0052] [Item name acquisition unit 2 1 8]
項目名取得部 2 1 は、 上述した項目名取得処理を実行し、 複数の帳票に 記載された複数の項目名を取得する。 項目名取得部 2 1 は、 主に辞書作成 装置 1 0のプロセッサ 1 1、 記憶装置 1 2及び通信用インターフェース 1 3 により実現される。 The item name acquisition unit 21 executes the above-mentioned item name acquisition process to acquire a plurality of item names described in a plurality of forms. The item name acquisition unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
なお、 項目名取得部 2 1 により実行される処理が項目名取得工程に相当 する。 The process executed by the item name acquisition unit 21 corresponds to the item name acquisition process.
[0053] 具体的には、 プロセッサ 1 1は、 通信用インターフェース 1 3を介して、 帳票処理装置 3 0から処理の対象とする複数の帳票の解析結果を取得する。 ここで、 複数の帳票の解析結果には、 帳票から光学文字認識により得た 1以 上の項目名の文字列データ、 手続き識別情報、 帳票識別情報を含む。 Specifically, the processor 11 acquires the analysis results of a plurality of forms to be processed from the form processing device 30 via the communication interface 13. Here, the analysis results of a plurality of forms include character string data of one or more item names obtained by optical character recognition from the forms, procedure identification information, and form identification information.
[0054] 項目名取得部 2 1 八は、 同一手続きに関して異なる自治体で用いられてい る複数の帳票にそれぞれ記載された複数の項目名を取得する。 このとき、 い ずれの手続きに属するいずれの帳票から抽出した項目名であるかを判別可能 な、 手続き I 0、 帳票 I 口等の手続き識別情報、 帳票識別情報を項目名と合 せて取得する。 一例として、 手続き I 0、 帳票 I 0は、 帳票を取り込むとき にユーザより入力された情報を取得することができる。 [0054] The item name acquisition unit 2 18 acquires a plurality of item names described in a plurality of forms used by different local governments for the same procedure. At this time, the procedure identification information such as Procedure I 0, Form I, etc., and the form identification information, which can identify from which form belonging to which procedure the item name is extracted, are acquired together with the item name. .. As an example, the procedure I 0 and the form I 0 can acquire the information input by the user when importing the form.
なお、 項目名取得部 2 1 は、 帳票処理装置 3 0から複数の帳票のイメー ジを取得し、 取得したイメージから所定の画像処理に基づいて項目名の文字 列データを得ることとしてもよい。 The item name acquisition unit 21 may acquire image data of a plurality of forms from the form processing device 30 and may obtain character string data of item names from the acquired images based on predetermined image processing.
[0055] [第 1処理部 2 1 巳] [0055] [First processing unit 21 1 Sumi]
第 1処理部 2 1 巳は、 上述した第 1処理を実行し、 項目名取得部 2 1 八に より取得した複数の項目名のそれぞれに含まれる一又は複数の単語を、 一又 は複数の共通単語グループに分類し、 共通単語グループを作成する。 第 1処 理部 2 1 巳は、 主に辞書作成装置 1 0のプロセッサ 1 1及び記憶装置 1 2に より実現される。 The first processing unit 21 1 executes the above-described first process, and selects one or more words contained in each of the plurality of item names acquired by the item name acquisition unit 2 18 as one or more words. Classify into common word groups and create common word groups. The first processing unit 21 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10.
なお、 第 1処理部 2 1 巳により実行される処理が第 1処理工程に相当する \¥0 2020/175662 14 卩(:171? 2020 /008190 The processing executed by the first processing unit 21 1 corresponds to the first processing step. \¥0 2020/175662 14 卩 (: 171? 2020 /008190
[0056] 具体的には、 第 1処理部 2 1 巳は、 複数の項目名間で共通する単語を含む 項目名の共通の単語以外の単語、 すなわち共通する単語と対で (共に) 用い られて一の項目名を構成している単語を、 共通の単語ごとに、 共通の単語で 束ねてグループ化する。 [0056] Specifically, the first processing unit 21 1 is used in a pair (both) with a word other than the common word of the item names including the word common to the plurality of item names, that is, the common word. The words that make up one item name are grouped together for each common word.
[0057] [第 2処理部 2 1 〇] [0057] [Second processing unit 2 1 〇]
第 2処理部 2 1 〇は、 上述した第 2処理を実行し、 第 1処理で作成された 共通単語グループのそれぞれに対し、 グループ内の各単語同士が同義語の可 能性が高いか、 異義語の可能性が高いか、 を判別し、 同義語候補、 異義語候 補 (同 異義語候補) を作成する。 第 2処理部 2 1 (3は、 主に辞書作成装置 1 0のプロセッサ 1 1及び記憶装置 1 2により実現される。 The second processing unit 210 executes the above-mentioned second processing, and for each of the common word groups created in the first processing, is it highly possible that each word in the group is a synonym? Determine whether there is a high probability of synonyms and create synonym candidates and synonym candidates (synonym candidates). The second processing unit 21 (3 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10).
なお、 第 2処理部 2 1 〇により実行される処理が、 第 2処理工程に相当す る。 The processing executed by the second processing unit 210 corresponds to the second processing step.
[0058] 具体的には、 第 2処理部 2 1 (3は、 帳票を特定する情報である帳票識別情 報に基づき、 単語同士が同義か異義かを判別する。 処理対象の単語同士が互 いに共通する帳票識別情報を有していない場合に、 その単語同士を同義語と 判別し、 処理対象の単語同士が互いに共通する帳票識別情報を有している場 合に、 単語同士を異義語と判別する。 [0058] Specifically, the second processing unit 21 (3 determines whether the words are synonymous or synonymous based on the form identification information that is information for specifying the form. If they do not have common form identification information, the words are distinguished as synonyms, and if the words to be processed have common form identification information, the words are not synonymous. Distinguish as a word.
なお、 同義語の判別、 異義語の判別のいずれか一方のみを行っても良い。 その場合、 最終的には同義語辞書、 又は異義語辞書のいずれか一方のみが作 成されることになる。 Note that only one of the synonym determination and the synonym determination may be performed. In that case, eventually either the synonym dictionary or the synonym dictionary will be created.
[0059] [提示部 2 1 0 ] [0059] [Presentation section 210]
提示部 2 1 0は、 第 2処理で作成された同 異義語候補を、 表示デバイス 3 2に表示させ、 提示する。 The presentation unit 210 displays the synonym candidates created in the second process on the display device 32 and presents them.
提示部 2 1 0は、 主に辞書作成装置 1 0のプロセッサ 1 1、 記憶装置 1 2 及び通信用インターフエース 1 3により実現される。 The presentation unit 210 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
[0060] 具体的には、 プロセッサ 1 1は、 同 ·異義語候補記憶部に記憶されている 同義語及び/又は異義語の候補を、 通信用インターフヱース 1 3を介して帳 票処理装置 3 0に送信して、 帳票処理装置 3 0の表示デバイス 3 2に表示さ \¥02020/175662 15 卩(:171?2020/008190 [0060] Specifically, the processor 11 selects the synonym and/or synonym candidates stored in the synonym candidate storage unit via the communication interface 13 to form processing device 30. And display it on the display device 3 2 of the form processing unit 30. \¥02020/175662 15 卩 (: 171?2020/008190
せる。 Let
なお、 プロセッサ 1 1 は、 帳票処理装置 3 0に送信する処理を行わず、 辞 書作成装置付属の表 装置へ表 させても良い。 It should be noted that the processor 11 may not perform the process of transmitting it to the form processing device 30 but may display it on the display device attached to the document creation device.
[0061 ] [受付部 2 1 巳] [0061] [Reception Department 2 1 Sumi]
受付部 2 1 巳は、 帳票処理装置 3 0から、 ユーザが入力した同 異義語の 候補に対する承認、 却下等の情報の入力を受け付ける。 具体的には、 プロセ ッサ 1 1 は、 通信用インターフエース 1 3を介して、 帳票処理装置 3 0から 情報の入力を受け付ける。 The accepting unit 21 accepts information such as approval or rejection of the synonym candidates input by the user from the form processing apparatus 30. Specifically, the processor 11 receives input of information from the form processing device 30 via the communication interface 13.
受付部 2 1 巳は、 主に辞書作成装置 1 0のプロセッサ 1 1、 記憶装置 1 2 及び通信用インターフエース 1 3により実現される。 The reception unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
[0062] [更新部 2 1 ] [0062] [Update section 2 1]
更新部 2 1 は、 第 2処理部 2 1 <3が作成した同 ·異義語候補のデータに 対し、 受付部 2 1 巳が受け付けた承認、 却下等の情報を反映して、 最終的な 同 異義語辞書を作成、 又は更新する。 更新部 2 1 は、 主に辞書作成装置 1 0のプロセッサ 1 1及び記憶装置 1 2により実現される。 The updating unit 21 reflects the approval/rejection information received by the accepting unit 21 1 to the data of the synonym candidates created by the second processing unit 2 1 <3, and finally updates the data. Create or update a synonym dictionary. The updating unit 21 is realized mainly by the processor 11 and the storage device 12 of the dictionary creating device 10.
なお、 更新部 2 1 により実行される処理が辞書作成更新工程に相当する The process executed by the updating unit 21 corresponds to the dictionary creating/updating process.
[0063] [辞書作成装置 1 0による処理の流れ] [Process Flow by Dictionary Creation Device 10]
次に、 図 4及び図 5を参照しながら、 辞書作成装置 1 0により実行される 辞書作成処理の流れについて説明する。 Next, the flow of the dictionary creation processing executed by the dictionary creation device 10 will be described with reference to FIGS. 4 and 5.
[0064] 図 4に示されるように、 辞書作成装置 1 0は、 手続きの数を示す 3を 1 に 初期化して (3 1 ) 、 複数存在する手続きのうち、 処理対象の手続きとして 、 _つの手続き 8を選択する (3 2) 。 なお、 手続きの選択は、 ユーザから の入力を受け付けることにより実行されても良い。 [0064] As shown in Fig. 4, the dictionary creation device 10 initializes 3 indicating the number of procedures to 1 (3 1), and selects _ Select procedure 8 (32). The selection of the procedure may be executed by receiving the input from the user.
[0065] そして、 辞書作成装置 1 0は、 変数丨 を 1 に初期化して (3 3) 、 選択さ れた手続き 3に属する項目名 I ,を取得し (3 4) 、 形態素解析して項目名 丨 ,に含まれる名詞を抽出し、
Figure imgf000017_0001
を取得する (3 5) 。 次に、 辞書作 成装置 1 0は、 手続き 3に属する項目名 I + 111を選択し (3 6) 、 同様に形態 \¥02020/175662 16 卩(:171?2020/008190
[0065] Then, the dictionary creation device 10 initializes the variable 丨 to 1 (3 3) and acquires the item name I belonging to the selected procedure 3 (3 4) and morphologically analyzes the item. Extract the nouns included in the name,
Figure imgf000017_0001
To get (3 5). Next, the dictionary creating device 10 selects the item name I + 111 belonging to the procedure 3 (36), and similarly \¥02020/175662 16 卩(: 171?2020/008190
素解析して項目名 丨 | + 111に含まれる名詞を抽出し、 単語乂1〜乂[3を取得する (Elementary analysis is performed to extract the nouns included in the item name 丨| + 111 , and the words 乂1 to 乂[ 3 are acquired (
3 7) 0 3 7) 0
[0066] 次に、 辞書作成装置 1 0は、 抽出した単語 !〜 と単語乂1〜乂[3とを比 較し、 共通する単語があるかを判定する (3 8) 。 共通する単語がないとき (3 8 ; N 0) 、 処理を終了する。 一方、 共通する単語があるとき (3 8 ; 丫6 3) 、 その共通する単語の共通単語グループ〇¥が既に作成されているか 、 共通単語グループ記憶部 2 0巳を検索する (3 9) 。 [0066] Next, the dictionary creation device 10 extracts the extracted words! ~ And the word 1 ~ [[ 3 are compared to determine whether there is a common word (38). When there is no common word (3 8; N 0), the process ends. On the other hand, when there is a common word (3 8; 丫 6 3), it searches the common word group storage section 20 (3 9) whether the common word group 〇 ¥ of the common word has already been created.
[0067] 辞書作成装置 1 0は、
Figure imgf000018_0001
[0067] The dictionary creation device 10
Figure imgf000018_0001
その共通単語グループ〇¥に、 単語 】〜 と単語乂1〜乂[3、 及びそれぞれの 単語の帳票丨 口を共通単語グループに格納する (3 1 0) 。 一方、 辞書作成 装置 1 〇は、 共通単語グループ〇¥がない場合 (3 9 ; N 0) 、 新たに共通単 語グループ〇¥を作成し、 単語 !〜 と単語乂1〜乂[3、 及びそれぞれの単語 の帳票丨 口を共通単語グループ〇¥に格納する (3 1 1) 。 In the common word group 〇 ¥ , the words 】~ and the words 1 to 侂 [ 3 , and the forms of each word are stored in the common word group (3 10). On the other hand, when the common word group 〇 ¥ does not exist (39; N 0), the dictionary creation device 10 creates a new common word group 〇 ¥ and adds the word! ~ And word 1 ~ [ 3 , and the form of each word is stored in the common word group 〇 ¥ (3 1 1).
[0068] 辞書作成装置 1 0は、 項目名 丨 | + 111が最後の項目名か否かを判断し (3 1 2 ) 、 全ての項目名 丨
Figure imgf000018_0002
に対する処理が完了していない場合には (3 1 2 ; 〇) 、 に 1加算し (3 1 3) 、 3 6へ進む。 全ての項目名 丨 | + 111に対する処 理が完了している場合 (3 1 2 ; 丫6 3) 、 3 1 4に進む。 全ての項目名 I | に対する処理が完了していない場合 (3 1 4 ; 1\1〇) 、 丨 に 1加算し (3 1 5) 、 3 4へ進む。 全ての項目名 丨 |に対する処理が完了している場合 (3 1
[0068] The dictionary creating device 10 determines whether or not the item name 丨| +111 is the last item name (3 1 2 ), and all the item names 丨
Figure imgf000018_0002
If the process for is not completed (3 1 2 ;○), add 1 to (3 1 3) and proceed to 3 6. If all item names 丨| + 111 have been processed (3 1 2 ;丫6 3 ), proceed to 3 1 4. If the processing for all item names I | is not completed (3 1 4 ;1\100), add 1 to 丨 (3 1 5) and proceed to 3 4. When the processing for all item names 丨| has been completed (3 1
4 ; 丫㊀ 3) 、 3 1 6へ進む。 Go to 4; 丫㊀ 3), 3 1 6.
[0069] 辞書作成装置 1 0は、 3 1 6で、 複数の手続きのうち、 全ての手続き 3に 対し、 処理を実行したかどうかを判断する。 全ての手続きに対する処理が完 了していない場合、 3 1 7へ進んで、 3に 1加算する。 全ての手続きに対す る処理が完了している場合、 処理を終了する。 [0069] The dictionary creation device 10 determines whether or not all the procedures 3 of the plurality of procedures have been processed at 316. If processing for all procedures is not completed, proceed to 317 and add 1 to 3. If the processes for all the procedures have been completed, the process ends.
[0070] 次に、 辞書作成装置 1 0は、 上記により作成された各共通単語グループに 対し、 図 5に示される処理を実行する。 先ず、 辞書作成装置 1 0は、 変数 3 及び変数 !<を初期化して (3 2 1) 、 手続き 3を取得する (3 2 2) 。 次に 、 辞書作成装置 1 〇は、 共通単語グループ〇!<を選択する (3 2 3) 。 次に I \¥02020/175662 17 卩(:171?2020/008190 [0070] Next, the dictionary creating device 10 executes the process shown in FIG. 5 for each of the common word groups created as described above. First, the dictionary creating device 10 initializes the variable 3 and the variable !< (3 2 1) and acquires the procedure 3 (3 2 2). Next, the dictionary creation device 10 selects the common word group ◯ ! <(3 2 3). Then I \¥02020/175662 17 卩(: 171?2020/008190
を初期化して (3 2 4) 、 共通単語グループ〇!^から、 共通単語グループ〇!^ 内に保存されている単語の直積を作成する (3 2 5) 。 そして、 直積の要素 ごとに、 同一帳票で使われている数を示すカウント数を算出する (3 2 6) Is initialized (3 2 4) and a direct product of the words stored in the common word group 〇 ! ^ is created from the common word group 〇 ! ^ (3 2 5). Then, for each element of the direct product, a count number indicating the number used in the same form is calculated (3 2 6)
。 これは、 共通単語グループ〇!^内の各単語の帳票 I 0を、 帳票 I 0ごとに力 ウントすることで算出される。 .. This is calculated by increasing the form I 0 of each word in the common word group ◯ ! ^ for each form I 0.
[0071 ] 辞書作成装置 1 0は、 算出されたカウント数が 0 (ゼロ) より大きいか否 かを判定し (3 2 7) 、 0より大きい場合 { 2 1 ; 丫 6 3) 、 それらの単 語は異義語であると判定し、 同 ·異義語候補記憶部に異義語として書き込み (3 2 8) , 3 3 0へ進む。 一方、 カウント数が 0の場合 (3 2 7 ; 1\1〇)[0071] dictionary creating apparatus 1 0, the calculated number of counts is determined whether 0 (zero) or greater than (3 2 7), when greater than 0 {2 1;丫6 3), their single It is determined that the word is a synonym, and it is written as a synonym in the same-synonym candidate storage unit (3 2 8), and the process proceeds to 3 30. On the other hand, when the count number is 0 (3 2 7 ;1\100)
、 それらの単語は同義語であると判定し、 同 ·異義語候補記憶部に同義語と して書き込み (3 2 9) 、 3 3 0へ進む。 , These words are determined to be synonyms, and are written as synonyms in the synonym candidate storage section (329), and the process proceeds to 330.
[0072] 辞書作成装置 1 0は、 単語 Iが最後の単語か否かを判断し (3 3 0) 、 全ての単語 I に対する処理が完了していない場合には (3 3 0 ; 1\1〇) 、 丨 に 1加算し (3 3 1) 、 3 2 5へ進む。 全ての単語丨 に対する処理が完了 している場合 (3 3 0 ; 丫6
Figure imgf000019_0001
、 3 3 2に進む。 次に、 全ての共通単語グ ループ〇!^に対する処理が完了しているか判定し (3 3 2) 、 完了していない 場合 (3 3 2 ; N 0) 、 1<に 1加算し (3 3 3) 、 3 2 3へ進む。 全ての項 目名 丨 Iに対する処理が完了している場合 (3 3 2 ; 丫 6 3) 、 3 3 4へ進む
[0072] The dictionary creation device 10 judges whether or not the word I is the last word (3 30), and when the processing for all the words I is not completed, (3 3 0 ;1\1 〇), add 1 to 丨 (3 3 1), and proceed to 3 2 5. When all words have been processed (3 0; ; 6
Figure imgf000019_0001
, Go to 3 3 2. Next, it is judged whether the processing has been completed for all common word groups ○ ! ^ (3 3 2), and if not completed (3 3 2 ;N 0), 1 is added to 1< (3 3 2). 3), go to 3 2 3. If all item names I have been processed (3 3 2; 丫6 3 ), proceed to 3 3 4.
[0073] 3 3 4では、 複数の手続きのうち、 全ての手続き 3に対し、 処理を実行し たかどうかを判断する (3 3 4) 。 全ての手続きに対する処理が完了してい ない場合 (3 3 4 ; N 0) 、 3 3 5へ進んで、 3に 1加算する。 全ての手続 きに対する処理が完了している場合、 処理を終了する。 [0073] In 3 34, it is determined whether or not the process has been executed for all procedures 3 among the plurality of procedures (3 3 4). If processing for all procedures is not completed (3 3 4; N 0), proceed to 3 3 5 and add 1 to 3. When the processing for all the procedures is completed, the processing ends.
このように、 図 5に示される処理により、 同一帳票内に処理対象となって いる単語同士が同一帳票内で使用されているか否かに基づいて、 共通単語グ ループ内の単語同士が同義か異義かを判定する。 なお、 図 5に示される処理 は、 同一帳票内で使用されているかを判断する処理の一例であり、 これに限 られず、 同一帳票内の使用有無を判断できるものであればよい。 \¥02020/175662 18 卩(:171?2020/008190 In this way, according to the processing shown in FIG. 5, words in the common word group are synonymous with each other based on whether or not the words to be processed in the same form are used in the same form. Determine if it is different. Note that the process shown in FIG. 5 is an example of a process for determining whether or not the same form is used, and the process is not limited to this and may be any process that can determine whether or not the same form is used. \¥02020/175662 18 卩 (: 171?2020/008190
[0074] 以上のように、 本実施形態によれば、 帳票 から抽出した項目名から、 単 語を取得し、 同義語辞書、 異義語辞書を作成することができる。 As described above, according to this embodiment, it is possible to acquire a word from the item names extracted from the form and create a synonym dictionary and a synonym dictionary.
さらに、 これら一連の処理を機械学習の学習モデルとして学習させること もできる。 このように学習させることにより、 より自動化された効率的な辞 書生成機能を構築することが可能となる。 Further, the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
[0075] [その他の実施形態] [0075] [Other Embodiments]
本発明は上記の実施形態に限定されるものではない。 The present invention is not limited to the above embodiment.
辞書作成装置 1 〇と帳票処理装置 3 0を 1つの装置として構成してもよい The dictionary creation device 10 and the form processing device 30 may be configured as one device.
また、 辞書作成装置 1 〇は、 1台のコンピュータに限られず、 複数台のコ ンピュ _夕から構成されてもよい。 符号の説明 The dictionary creation device 10 is not limited to one computer, and may be composed of multiple computers. Explanation of symbols
[0076] 1 情報処理システム [0076] 1 Information processing system
1 0 辞書作成装置 1 0 Dictionary creation device
1 1 プロセッサ 1 1 processor
1 2 記憶装置 1 2 storage device
1 3 通信用インターフエース 1 3 Communication interface
2 0 項目名記憶部 20 Item name storage
2〇巳 共通単語グループ記憶部 20 巳 Common word group memory
2 0 0 同 ·異義語候補記憶部 2 0 0 Same ・Synonym candidate storage
2 0 0 同 ·異義語辞書記憶部 2 0 0 Same ・Synonym dictionary storage
2 1 項目名取得部 2 1 Item name acquisition section
2 1 巳 第 1処理部 2 1 Min 1st Processing Department
2 1 0 第 2処理部 2 1 0 2nd processing unit
2 1 0 提示部 2 1 0 presentation section
2 1 日 受付部 2 1 day reception
2 1 更新部 2 1 Update section
3 0 帳票処理装置 \¥02020/175662 19 卩(:171?2020/008190 30 Form processing device \¥02020/175662 19 卩(:171?2020/008190
3 1 入カデバイス 3 1 Input device
32 表示デバイス 32 display devices
40 スキヤナ 40 Skiana
01 , 02, 03 共通項目グループ 01, 02, 03 Common item group
I 項目名 I item name
I 〇 項目名群 I 〇 Item name group
帳票 Report
◦ 帳票群 ◦ Form group

Claims

\¥02020/175662 20 卩(:171?2020/008190 請求の範囲 \¥02020/175662 20 units (: 171?2020/008190 Claims
[請求項 1 ] 帳票の項目名の同義語辞書及び異義語辞書の少なくとも一方を作成 する辞書作成装置であって、 [Claim 1] A dictionary creation device for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form,
複数の帳票に記載された複数の項目名を取得する項目名取得部と、 前記項目名取得部により取得した複数の項目名のそれぞれに含まれ る一又は複数の単語を、 所定の条件に基づいて分類し、 一又は複数の 共通単語グループを作成する第 1処理部と、 An item name acquisition unit that acquires a plurality of item names described in a plurality of forms, and one or more words included in each of the plurality of item names acquired by the item name acquisition unit, based on a predetermined condition. And a first processing unit that classifies one or more common word groups, and
前記帳票を特定する情報に基づいて、 前記共通単語グループ内の単 語が互いに同義であるか異義であるかを前記共通単語グループごとに 判別する第 2処理部と、 A second processing unit that determines, for each common word group, whether the words in the common word group are synonymous or synonymous with each other based on the information that identifies the form;
を備えることを特徴とする辞書作成装置。 A dictionary creation device comprising:
[請求項 2] 前記第 1処理部は、 複数の前記項目名間で共通する単語を含む項目 名の前記共通の単語以外の単語を、 同一の共通単語グループに分類す ることを特徴とする請求項 1 に記載の辞書作成装置。 [Claim 2] The first processing unit classifies words other than the common word of item names including words common to a plurality of the item names into the same common word group. The dictionary creation device according to claim 1.
[請求項 3] 前記第 2処理部は、 _の前記共通単語グループ内の各単語が同一の 前記帳票で使用されていない場合に、 前記単語同士を同義語と判定す ることを特徴とする請求項 1又は 2に記載の辞書作成装置。 [Claim 3] The second processing unit determines that the words are synonyms when the words in the common word group of _ are not used in the same form. The dictionary creation device according to claim 1 or 2.
[請求項 4] 前記項目名取得部は、 取得した前記項目名が記載されていた帳票を 特定する帳票識別情報を前記項目名ごと取得し、 [Claim 4] The item name acquisition unit acquires, for each of the item names, form identification information that identifies a form in which the acquired item name is described,
前記共通単語グループは、 共通単語グループ記憶部に記憶され、 該 共通単語グループに属する単語と、 該単語ごとに該単語が記載されて いた帳票の帳票識別情報とを有し、 The common word group is stored in a common word group storage unit, and has a word belonging to the common word group, and form identification information of a form in which the word is described for each word,
前記第 2処理部は、 処理対象の単語同士が互いに共通する帳票識別 情報を有していない場合に、 前記単語同士を同義語と判別することを 特徴とする請求項 1乃至 3のいずれか一項に記載の辞書作成装置。 4. The second processing unit determines that the words are synonyms when the words to be processed do not have common form identification information. 4. The dictionary creation device according to item.
[請求項 5] 前記第 2処理部は、 処理対象の単語同士が互いに共通する帳票識別 情報を有している場合に、 前記単語同士を異義語と判別することを特 徴とする請求項 4に記載の辞書作成装置。 \¥02020/175662 21 卩(:171?2020/008190 [Claim 5] The second processing unit is characterized in that when the words to be processed have common form identification information, the words are distinguished as synonyms. The dictionary creation device described in. \¥02020/175662 21 卩(: 171?2020/008190
[請求項 6] 同義語辞書及び異義語辞書の少なくとも一方を作成するための辞書 作成装置による辞書作成方法であって、 前記辞書作成装置が、 複数の帳票に記載された複数の項目名を取得する項目名取得工程と 前記項目名取得工程で取得した複数の項目名のそれぞれに含まれる 一又は複数の単語を、 所定の条件に基づいて分類し、 一又は複数の共 通単語グループを作成する第 1処理工程と、 [Claim 6] A dictionary creating method for creating at least one of a synonym dictionary and a synonym dictionary, the dictionary creating apparatus acquiring a plurality of item names described in a plurality of forms. The item name acquisition step and one or more words included in each of the plurality of item names acquired in the item name acquisition step are classified based on predetermined conditions to create one or more common word groups. A first treatment step,
前記帳票を特定する情報に基づいて、 前記共通単語グループ内の単 語が互いに同義であるか異義であるかを前記共通単語グループごとに 判別する第 2処理工程と、 A second processing step of determining, for each common word group, whether the words in the common word group are synonymous or synonymous with each other based on the information specifying the form;
を備えることを特徴とする辞書作成方法。 A method for creating a dictionary, comprising:
[請求項 7] 帳票の項目名の同義語辞書及び異義語辞書の少なくとも一方を作成 する辞書作成プログラムであって、 コンビユータを、 複数の帳票に記載された複数の項目名を取得する項目名取得部と、 前記項目名取得部により取得した複数の項目名のそれぞれに含まれ る一又は複数の単語を、 所定の条件に基づいて分類し、 一又は複数の 共通単語グループを作成する第 1処理部と、 [Claim 7] A dictionary creation program for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form, which allows a computer to acquire an item name of a plurality of item names described in multiple forms Section and one or more words contained in each of the plurality of item names acquired by the item name acquisition unit are classified based on predetermined conditions to create one or more common word groups. Department,
前記帳票を特定する情報に基づいて、 前記共通単語グループ内の単 語が互いに同義であるか異義であるかを前記共通単語グループごとに 判別する第 2処理部として機能させるための辞書作成プログラム。 A dictionary creation program that functions as a second processing unit that determines, for each common word group, whether the words in the common word group are synonymous or synonymous with each other based on the information that identifies the form.
PCT/JP2020/008190 2019-02-28 2020-02-27 Dictionary creating device, dictionary creating method, and dictionary creating program WO2020175662A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-037050 2019-02-28
JP2019037050A JP7029813B2 (en) 2019-02-28 2019-02-28 Dictionary creation device, dictionary creation method and dictionary creation program

Publications (1)

Publication Number Publication Date
WO2020175662A1 true WO2020175662A1 (en) 2020-09-03

Family

ID=72240013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/008190 WO2020175662A1 (en) 2019-02-28 2020-02-27 Dictionary creating device, dictionary creating method, and dictionary creating program

Country Status (2)

Country Link
JP (1) JP7029813B2 (en)
WO (1) WO2020175662A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269858B (en) * 2020-10-22 2024-04-19 中国平安人寿保险股份有限公司 Optimization method, device, equipment and storage medium of synonymous dictionary
JP7410501B1 (en) 2023-08-07 2024-01-10 株式会社ミラボ Program, electronic application form creation method, and electronic application form creation system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6338758B2 (en) * 1980-03-05 1988-08-02 Tokyo Shibaura Electric Co
JP2012048291A (en) * 2010-08-24 2012-03-08 Dainippon Printing Co Ltd Synonym dictionary generation device, data analysis device, data detection device, synonym dictionary generation method, and synonym dictionary generation program

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5671676B2 (en) * 2010-08-31 2015-02-18 パナソニックヘルスケアホールディングス株式会社 Document data conversion apparatus and document conversion program
JP5524138B2 (en) * 2011-07-04 2014-06-18 日本電信電話株式会社 Synonym dictionary generating apparatus, method and program thereof
JP2013109597A (en) * 2011-11-21 2013-06-06 Panasonic Corp Medical synonym dictionary creating device and medical synonym dictionary creating method
JP6338758B1 (en) * 2017-11-10 2018-06-06 株式会社ナビット Distribution system, distribution method and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6338758B2 (en) * 1980-03-05 1988-08-02 Tokyo Shibaura Electric Co
JP2012048291A (en) * 2010-08-24 2012-03-08 Dainippon Printing Co Ltd Synonym dictionary generation device, data analysis device, data detection device, synonym dictionary generation method, and synonym dictionary generation program

Also Published As

Publication number Publication date
JP7029813B2 (en) 2022-03-04
JP2020140583A (en) 2020-09-03

Similar Documents

Publication Publication Date Title
US9639751B2 (en) Property record document data verification systems and methods
US8521561B2 (en) Database system, program, image retrieving method, and report retrieving method
US8064703B2 (en) Property record document data validation systems and methods
US20120102002A1 (en) Automatic data validation and correction
JP2001515623A (en) Automatic text summary generation method by computer
CN111274239A (en) Test paper structuralization processing method, device and equipment
US11727213B2 (en) Automatic conversation bot generation using input form
CN106708940A (en) Method and device used for processing pictures
US20220375246A1 (en) Document display assistance system, document display assistance method, and program for executing said method
WO2020175662A1 (en) Dictionary creating device, dictionary creating method, and dictionary creating program
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
JP6529254B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, PROGRAM, AND STORAGE MEDIUM
JP2019003472A (en) Information processing apparatus and information processing method
US11386263B2 (en) Automatic generation of form application
JP2005190284A (en) Information classification device and method
CN110832831A (en) Call center conversation content display system, method, and program
US20220138259A1 (en) Automated document intake system
JP6964891B2 (en) Counter business management device, counter business management method and counter business management program
JP7041963B2 (en) Standard item name setting device, standard item name setting method and standard item name setting program
JP2002304401A (en) Device and method for processing questionnaire and program
JP7155546B2 (en) Information processing device, information processing method, and information processing program
JP5877775B2 (en) Content management apparatus, content management system, content management method, program, and storage medium
JP4169618B2 (en) Text information management device
CN111860263A (en) Information input method and device and computer readable storage medium
CN111931480A (en) Method and device for determining main content of text, storage medium and computer equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20763903

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/11/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 20763903

Country of ref document: EP

Kind code of ref document: A1