WO2019080419A1 - Method for building standard knowledge base, electronic device, and storage medium - Google Patents

Method for building standard knowledge base, electronic device, and storage medium

Info

Publication number
WO2019080419A1
WO2019080419A1 PCT/CN2018/076484 CN2018076484W WO2019080419A1 WO 2019080419 A1 WO2019080419 A1 WO 2019080419A1 CN 2018076484 W CN2018076484 W CN 2018076484W WO 2019080419 A1 WO2019080419 A1 WO 2019080419A1
Authority
WO
WIPO (PCT)
Prior art keywords
answer
question
keyword
meaning
word
Prior art date
Application number
PCT/CN2018/076484
Other languages
French (fr)
Chinese (zh)
Inventor
卢川
高祎璠
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019080419A1 publication Critical patent/WO2019080419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Definitions

  • the answers of the intelligent customer service robots are all set in advance, and are usually paired and saved in the basic database according to the way one question corresponds to one answer. Therefore, when constructing the basic database, it is necessary to maintain the problem as much as possible - the answer pair In order to realize the intelligent answer of the intelligent customer service robot, the maintenance of the basic database is huge and consumes a lot of labor costs.
  • a method for constructing a standard knowledge base comprising the steps of: S1, constructing an answer file: collecting an answer, and parsing the answer into a same file in a uniform format, the file including a form or a text.
  • S4 forming a question-answer pair: according to the generation rule of the question-answer pair, acquiring the content of the corresponding position in the answer file is embedded into the A question is generated in the corresponding change item in the question template, and the content of the corresponding position in the answer file is obtained to generate an answer, and the generated question and the answer link are saved as a question-answer pair.
  • FIG. 5 is a flowchart of a problem template in Embodiment 2 of the method of the present application.
  • Figure 8 is a schematic diagram showing the answer file in the form of a table in the method of the present application.
  • the electronic device 2 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the electronic device 2 can be a smartphone, a tablet, a laptop, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster composed of multiple servers).
  • the electronic device 2 includes at least, but not limited to, a built-in system 20 that can communicate with each other via a system bus to the memory 21, the processor 22, the network interface 23, and a standard knowledge base. among them:
  • FIG. 2 shows a schematic diagram of a program module of an embodiment of the standard knowledge base construction system 20.
  • the standard knowledge base construction system 20 can be divided into a file receiving module 201 and a template setting module. 202.
  • S11 collecting an answer
  • S12 splitting each answer into a sequence of words consisting of a plurality of keywords
  • S13 obtaining two meaning keywords representing the meaning of the answer in each word sequence
  • S14 de-duplicating the meaning keywords Classification
  • S15 one type of meaning keyword is used as the first row of the table, another type of meaning keyword is used as the first column of the table, and the intersecting cells of the first row and the first column are blank
  • S16 the value of the answer is represented in the sequence of acquired words.
  • the numerical keyword S17, the numeric keyword is filled in the cell in which the two meaning keywords in the sequence of the word of the numerical keyword are located and the column intersects.
  • the change position generation problem is temporarily stored; S42, obtaining the two meaning keywords of the generated problem, the numerical keyword of the row and the column intersecting the cell is temporarily stored as an answer; S43, the temporarily stored question and the answer are associated with each other; S44 And determining whether the meaning keyword in the current position corresponding to the first change item is the last word in the first row or the first column of the meaning keyword, and if yes, executing step S46, if otherwise, performing step S45; S45, first The current position corresponding to the change item is sequentially shifted one by one along the first line or the first column of the meaning keyword in the current position, and the current position corresponding to the first change item is reset, and step S41 is performed; S46, determining the second The change item corresponds to whether the meaning keyword in the current position is the last word in the first example or the first line of the meaning keyword, and if yes, step S48 is performed, if no Then, step S47 is executed; S47, the current position corresponding to the second change item is sequentially shifted
  • the second cell in the first row of the table and the two meaning keywords in the third cell of the first column are respectively embedded in the position of the two variables in the aforementioned problem template, and the problem is generated as " What is the income of health insurance in the first quarter?
  • the corresponding answer is the value "5246286" in the cell intersecting the second and third rows; until the second cell and the first column of the first row in the table are obtained
  • the meaning of the keyword in the last cell then take the meaning keyword in the third cell of the first row in the table, and then get the meaning keywords in each cell in the first column, in order
  • the problem-answer pair is saved to the standard knowledge base.
  • S40' obtaining the position of the first word sequence separator in the text as the position of the current word sequence separator, and the position of each keyword separator before the first word sequence separator as the position of each current keyword separator;
  • S41' According to the generation rule of the question-answer pair, each meaning keyword before each current keyword separator is obtained, and the problem item generated in the problem template is temporarily stored in the problem;
  • S42' the numerical keyword before the current word sequence separator is obtained.
  • step S43' the associated question and answer are saved; S44', determining whether the current word sequence separator is the last word sequence separator in the answer file, and if yes, executing step S47', if otherwise, performing step S45'; S45', the position of the current word sequence separator is sequentially shifted and the position of the current word sequence separator is reset; S46', the position of each current keyword separator is reset to the key before the current word sequence separator The position of the word separator is executed in step S41'; S47' and ended.

Abstract

A method for building a standard knowledge base, relating to the field of database maintenance. The method for building a standard knowledge base comprises the following steps: building an answer file (S1); building a question template (S2); setting constant items and question words (S3); and presetting a question-answer pair generating rule, and forming question-answer pairs (S4). By using the method to build a standard knowledge base, batch import of data can be achieved. Moreover, by automatically generating question-answer pairs according to a rule, the maintenance workload of a basic database is reduced, and the working efficiency is greatly improved.

Description

标准知识库的构建方法、电子装置及存储介质Method for constructing standard knowledge base, electronic device and storage medium
本申请申明享有2017年10月26日递交的申请号为201711031785.9、名称为“标准知识库的构建方法、电子装置及存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application claims the priority of the Chinese patent application filed on October 26, 2017, with the application number of 201711031785.9, entitled "Construction Method, Electronic Device and Storage Medium of Standard Knowledge Base", the entire contents of which are incorporated by reference. The way is combined in this application.
技术领域Technical field
本申请涉及数据库维护领域,涉及一种标准知识库的构建方法、电子装置及存储介质。The present application relates to the field of database maintenance, and relates to a method for constructing a standard knowledge base, an electronic device, and a storage medium.
背景技术Background technique
随着互联网的飞速发展,网络客服已经普及到各行各业,深入到日常商业服务的各个环节。目前,常见的网络客户通常由智能客服机器人和人工客服组成,智能客服可以实现昼夜和节假日的全天候服务,分流人工客服负担。With the rapid development of the Internet, online customer service has spread to all walks of life and goes deep into all aspects of daily business services. At present, common network customers are usually composed of intelligent customer service robots and manual customer service. Intelligent customer service can provide round-the-clock and day-to-day service, and divert the burden of manual customer service.
但是,智能客服机器人的回答都是事先设置好的,通常都是按一个问题对应一个答案的方式进行配对保存在基础数据库中,因此在构建基础数据库时,需要尽可能多地维护问题-答案对,以实现智能客服机器人的智能回答,使得基础数据库的维护工作量巨大,耗费大量的人力成本。However, the answers of the intelligent customer service robots are all set in advance, and are usually paired and saved in the basic database according to the way one question corresponds to one answer. Therefore, when constructing the basic database, it is necessary to maintain the problem as much as possible - the answer pair In order to realize the intelligent answer of the intelligent customer service robot, the maintenance of the basic database is huge and consumes a lot of labor costs.
发明内容Summary of the invention
本申请的目的在于提出了一种标准知识库的构建方法、电子装置及计算机可读存储介质,通过设定问题和答案的组成规则,使得系统可以根据设定的规则将接收到的内容进行问题和答案的自动生成。The purpose of the present application is to propose a method for constructing a standard knowledge base, an electronic device, and a computer readable storage medium. By setting a composition rule of a question and an answer, the system can perform the problem of the received content according to the set rule. And the automatic generation of the answer.
本申请是通过下述技术方案来解决上述技术问题:The present application solves the above technical problems by the following technical solutions:
一种标准知识库的构建方法,其特征在于,包括如下步骤:S1、构建答案文件:搜集答案,将所述答案按统一格式拆分后整理到同一个文件中,所 述文件包括表格或者文本;S2、构建问题模板:根据答案文件中的答案确定问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;S3、设置恒定项和疑问词:根据答案文件中的答案所表述的意思,确定问题模板中的恒定项和疑问词;S4、形成问题-答案对:根据问题-答案对的生成规则,获取所述答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。A method for constructing a standard knowledge base, comprising the steps of: S1, constructing an answer file: collecting an answer, and parsing the answer into a same file in a uniform format, the file including a form or a text. S2, build problem template: determine the question template according to the answer in the answer file, the question template is a sequence of words including a number of change items, a number of constant items and question words; S3, set constant items and question words: according to the answer file The meaning of the answer in the answer, determining the constant item and the question word in the question template; S4, forming a question-answer pair: according to the generation rule of the question-answer pair, acquiring the content of the corresponding position in the answer file is embedded into the A question is generated in the corresponding change item in the question template, and the content of the corresponding position in the answer file is obtained to generate an answer, and the generated question and the answer link are saved as a question-answer pair.
一种电子装置,包括存储器和处理器,所述存储器上存储有可被所述处理器执行的标准知识库的构建系统,所述标准知识库的构建系统包括:文件接收模块,用于接收整理好的答案文件,所述答案文件中包含有至少一个答案,所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;模板设置模块,用于根据答案文件中的答案设置问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;输入模块,用于接收恒定项和疑问词的内容;问题-答案对生成模块,用于根据问题-答案对的生成规则将接收到的答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。An electronic device includes a memory and a processor, wherein the memory stores a build system of a standard knowledge base executable by the processor, the build system of the standard knowledge base includes: a file receiving module, configured to receive a good answer file, the answer file includes at least one answer, the answer is split into a uniform file and then organized into a same file, the file includes a form or text; a template setting module is used according to the answer file The answer setting problem template is a word sequence including a plurality of variation items, a plurality of constant items and question words; an input module for receiving the contents of the constant item and the question word; and a question-answer pair generation module for Generating a question according to a generation rule of the question-answer pair by embedding the content of the corresponding position in the received answer file into a corresponding change item in the question template, and acquiring the content of the corresponding position in the answer file to generate an answer, which will be generated The question and the answer are associated as a question-answer pair.
一种计算机可读存储介质,所述计算机可读存储介质内存储有标准知识库的构建系统,所述标准知识库的构建系统可被至少一个处理器所执行,以实现以下步骤:S1、构建答案文件:搜集答案,将所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;S2、构建问题模板:根据答案文件中的答案确定问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;S3、设置恒定项和疑问词:根据答案文件中的答案所表述的意思,确定问题模板中的恒定项和疑问词;S4、形成问题-答案对:根据问题-答案对的生成规则,获取所述答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相 应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。A computer readable storage medium having a built-in system of a standard knowledge base stored therein, the built-in system of the standard knowledge base being executable by at least one processor to implement the following steps: S1, constructing Answer file: collecting the answers, sorting the answers into a unified file in a uniform format, the files include a form or text; S2, constructing a question template: determining a question template according to an answer in the answer file, the question The template is a sequence of words including a plurality of variation items, a plurality of constant items, and an interrogative word; S3, setting a constant term and an interrogative word: determining a constant term and an interrogative word in the question template according to the meaning expressed by the answer in the answer file; S4 Forming a question-answer pair: according to the generation rule of the question-answer pair, acquiring the content of the corresponding position in the answer file is embedded into the corresponding change item in the question template to generate a question, and acquiring the content of the corresponding position in the answer file To generate an answer, save the generated question and the answer association as a question-answer pair.
本申请的积极进步效果在于:采用本申请构建标准知识库,实现批量导入数据,根据规则自动生成问题-答案对,减少基础数据库的维护工作量。The positive progress of the application is that the standard knowledge base is built by the application, the data is imported in batches, and the problem-answer pair is automatically generated according to the rules, and the maintenance workload of the basic database is reduced.
附图说明DRAWINGS
图1示出了本申请电子装置一实施例的硬件架构示意图;1 is a schematic diagram showing the hardware architecture of an embodiment of an electronic device of the present application;
图2示出了本申请电子装置中标准知识库的构建系统一实施例的程序模块示意图;2 is a schematic diagram showing a program module of an embodiment of a system for building a standard knowledge base in an electronic device of the present application;
图3示出了本申请标准知识库的构建方法实施例一的流程图;FIG. 3 is a flowchart showing Embodiment 1 of a method for constructing a standard knowledge base of the present application;
图4示出了本申请方法实施例二中构建答案文件的流程图;4 is a flow chart showing the construction of an answer file in Embodiment 2 of the method of the present application;
图5示出了本申请方法实施例二中问题模板的流程图;FIG. 5 is a flowchart of a problem template in Embodiment 2 of the method of the present application;
图6示出了本申请方法实施例二中设置恒定项和疑问词的流程图;6 is a flow chart showing setting constant items and interrogative words in Embodiment 2 of the method of the present application;
图7示出了本申请方法实施例二中形成问题-答案对的流程图;FIG. 7 is a flow chart showing a problem-answer pair formed in Embodiment 2 of the method of the present application;
图8示出了本申请方法中答案文件为表格形式时的示意图;Figure 8 is a schematic diagram showing the answer file in the form of a table in the method of the present application;
图9示出了本申请方法实施例三中构建答案文件的流程图;FIG. 9 is a flowchart of constructing an answer file in Embodiment 3 of the method of the present application;
图10示出了本申请方法实施例三中构建问题模板的流程图;FIG. 10 is a flowchart of constructing a problem template in Embodiment 3 of the method of the present application;
图11示出了本申请方法实施例三中形成问题-答案对的流程图;Figure 11 is a flow chart showing the formation of a question-answer pair in the third embodiment of the method of the present application;
图12示出了本申请方法中答案文件为文本形式时的示意图。Figure 12 is a diagram showing the answer file in the form of text in the method of the present application.
具体实施方式Detailed ways
下面通过实施例的方式进一步说明本申请,但并不因此将本申请限制在所述的实施例范围之中。The present application is further illustrated by the following examples, but is not intended to limit the scope of the embodiments.
首先,本申请提出了一种电子装置。First of all, the present application proposes an electronic device.
参阅图1所示,是本申请电子装置一实施例的硬件架构示意图。本实施 例中,所述电子装置2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图所示,所述电子装置2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及标准知识库的构建系统20。其中:1 is a schematic diagram of a hardware architecture of an embodiment of an electronic device of the present application. In the present embodiment, the electronic device 2 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance. For example, it can be a smartphone, a tablet, a laptop, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster composed of multiple servers). As shown, the electronic device 2 includes at least, but not limited to, a built-in system 20 that can communicate with each other via a system bus to the memory 21, the processor 22, the network interface 23, and a standard knowledge base. among them:
所述存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子装置2的内部存储单元,例如该电子装置2的硬盘或内存。在另一些实施例中,所述存储器21也可以是所述电子装置2的外部存储设备,例如该电子装置2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子装置2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述电子装置2的操作系统和各类应用软件,例如所述标准知识库的构建系统20的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 21 includes at least one type of computer readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or a memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk equipped on the electronic device 2, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Of course, the memory 21 can also include both the internal storage unit of the electronic device 2 and its external storage device. In this embodiment, the memory 21 is generally used to store an operating system installed in the electronic device 2 and various types of application software, such as program code of the build system 20 of the standard knowledge base. Further, the memory 21 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子装置2的总体操作,例如执行与所述电子装置2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的标准知识库的构建系统20等。The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the electronic device 2, such as performing control and processing associated with data interaction or communication with the electronic device 2. In this embodiment, the processor 22 is configured to run program code or process data stored in the memory 21, such as the build system 20 or the like that runs the standard knowledge base.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述电子装置2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述电子装置2与外部终端相连,在所述电子装置2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 2 and other electronic devices. For example, the network interface 23 is configured to connect the electronic device 2 to an external terminal through a network, establish a data transmission channel, a communication connection, and the like between the electronic device 2 and an external terminal. The network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network. Wireless or wired networks such as network, Bluetooth, Wi-Fi, etc.
需要指出的是,图1仅示出了具有组件21-23的电子装置2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。It is pointed out that FIG. 1 only shows the electronic device 2 with the components 21-23, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
在本实施例中,存储于存储器21中的所述标准知识库的构建系统20可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并可由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。In the present embodiment, the construction system 20 of the standard knowledge base stored in the memory 21 can be divided into one or more program modules, the one or more program modules are stored in the memory 21, and can be Or multiple processors (this embodiment is processor 22) are executed to complete the application.
例如,图2示出了所述标准知识库的构建系统20一实施例的程序模块示意图,该实施例中,所述标准知识库的构建系统20可以被分割为文件接收模块201、模板设置模块202、输入模块203和问题-答案对生成模块204。以下描述将具体介绍所述程序模块201-204的具体功能。For example, FIG. 2 shows a schematic diagram of a program module of an embodiment of the standard knowledge base construction system 20. In this embodiment, the standard knowledge base construction system 20 can be divided into a file receiving module 201 and a template setting module. 202. An input module 203 and a question-answer pair generation module 204. The following description will specifically describe the specific functions of the program modules 201-204.
所述文件接收模块201用于接收整理好的答案文件,所述答案文件中包含有至少一个答案,所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;所述模板设置模块202用于根据答案文件中的答案设置问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;所述输入模块203用于接收恒定项和疑问词的内容;所述问题-答案对生成模块204用于根据问题-答案对的生成规则将接收到的答案文件中 相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。The file receiving module 201 is configured to receive the prepared answer file, where the answer file includes at least one answer, and the answers are split into a unified file in a unified format, and the file includes a form or a text; The template setting module 202 is configured to set a question template according to an answer in the answer file, the question template is a word sequence including a plurality of change items, a plurality of constant items, and an interrogative word; the input module 203 is configured to receive the constant item and The content of the question word; the question-answer pair generating module 204 is configured to generate a question by embedding the content of the corresponding position in the received answer file into the corresponding change item in the question template according to the generation rule of the question-answer pair. At the same time, the content of the corresponding position in the answer file is obtained to generate an answer, and the generated question and the answer association are saved as a question-answer pair.
本实施例中,所述答案文件中的内容需要事先按统一格式进行整理。需要注意的是,这里整理的格式需要和问题-答案对的生成规则相匹配,比如:答案文件按文本格式进行整理,那么问题-答案对的生成规则也是针对文本格式的文件的,又如:答案文件按表格格式进行整理,那么问题-答案对的生成规则就是针对表格格式的文件的。In this embodiment, the content in the answer file needs to be organized in a unified format in advance. It should be noted that the format compiled here needs to match the generation rule of the question-answer pair. For example, the answer file is organized in text format, then the generation rule of the question-answer pair is also for the file in text format, and another example: The answer file is organized in a tabular format, so the generation rule for the question-answer pair is for the file in tabular format.
其次,本申请提出一种标准知识库的构建方法。Secondly, the present application proposes a method of constructing a standard knowledge base.
在实施例一中,如图3所示,所述的标准知识库的构建方法包括如下步骤:S1、构建答案文件:搜集答案,将所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;S2、构建问题模板:根据答案文件中的答案确定问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;S3、设置恒定项和疑问词:根据答案文件中的答案所表述的意思,确定问题模板中的恒定项和疑问词;S4、形成问题-答案对:根据问题-答案对的生成规则,获取所述答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。In the first embodiment, as shown in FIG. 3, the method for constructing the standard knowledge base includes the following steps: S1: constructing an answer file: collecting an answer, splitting the answer into a uniform format, and then collating the same file into the same file. The file includes a form or a text; S2, constructing a question template: determining a question template according to an answer in the answer file, the question template being a word sequence including a plurality of change items, a plurality of constant items, and an interrogative word; S3, setting a constant Item and question word: determine the constant item and the question word in the question template according to the meaning expressed by the answer in the answer file; S4, form a question-answer pair: according to the generation rule of the question-answer pair, obtain the answer file The content of the corresponding location is embedded into the corresponding change item in the question template to generate a question, and the content of the corresponding position in the answer file is obtained to generate an answer, and the generated question and the answer association are saved as a question-answer pair.
本实施例中,答案文件可以是文本或表格,下面分别以答案文件为文本和表格对本方法中的步骤做进一步地详细说明。In this embodiment, the answer file may be a text or a table, and the steps in the method are further described in detail below by using the answer file as a text and a form.
实施例二中,在实施例一的基础上,以答案文件为表格格式对实施例一中的各个步骤做了进一步地说明,具体如下:In the second embodiment, on the basis of the first embodiment, the steps in the first embodiment are further illustrated by using the answer file as a table format, as follows:
一、构建答案文件(如图4所示)First, build an answer file (as shown in Figure 4)
S11、搜集答案;S12、将每个答案拆分为由若干关键词组成的词序列; S13、获取每个词序列中表征答案含义的两个含义关键词;S14、将含义关键词去重并分类;S15、将一类含义关键词作为表格的首行,另一类含义关键词作为表格的首列,且首行和首列的交叉单元格空白;S16、获取词序列中表征答案数值的数值关键词;S17、将所述数值关键词填写在该数值关键词所在词序列中的两个含义关键词所在行和列交叉的单元格内。S11, collecting an answer; S12, splitting each answer into a sequence of words consisting of a plurality of keywords; S13, obtaining two meaning keywords representing the meaning of the answer in each word sequence; S14, de-duplicating the meaning keywords Classification; S15, one type of meaning keyword is used as the first row of the table, another type of meaning keyword is used as the first column of the table, and the intersecting cells of the first row and the first column are blank; S16, the value of the answer is represented in the sequence of acquired words. The numerical keyword; S17, the numeric keyword is filled in the cell in which the two meaning keywords in the sequence of the word of the numerical keyword are located and the column intersects.
二、构建问题模板(如图5所示)Second, build a problem template (as shown in Figure 5)
S21、对应答案文件中含义关键词的种类确定问题模板中变动项的数量;S22、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符合语法的问题模板。S21. The type of the meaning keyword in the corresponding answer file determines the number of the change items in the question template. S22. Add a function word between the change item, the constant item, and the question word according to the grammar to form a grammatical problem template.
三、设置恒定项和疑问词(如图6所示)Third, set constant items and question words (as shown in Figure 6)
S31、将每个答案拆分为由若干关键词组成的词序列;S32、获取词序列中表征答案含义的含义关键词中的抽象名词作为恒定项;S33、在恒定项之后设置一个适合于询问答案文件中数值关键词的疑问词。S31, splitting each answer into a sequence of words consisting of a plurality of keywords; S32, obtaining an abstract noun in the meaning keyword representing the meaning of the answer in the sequence of words as a constant term; S33, setting a suitable one for the query after the constant term Question words for numerical keywords in the answer file.
四、形成问题-答案对(如图7所示)Fourth, the formation of the problem - the answer pair (as shown in Figure 7)
S40、获取表格中首行和首列中首个非空白单元格作为两个变动项的当前位置;S41、根据问题-答案对的生成规则,获取当前位置中的含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S42、获取生成问题的两个含义关键词所在行和列交叉的单元格中数值关键词作为答案暂存;S43、将暂存的问题和答案关联保存;S44、判断首个变动项对应的当前位置中的含义关键词是否为该含义关键词所在首行或者首列中的最后一个词,若是则执行步骤S46,若否则执行步骤S45;S45、将首个变动项对应的当前位置沿该当前位置中的含义关键词所在的首行或者首列向后顺序推移一个,并重置首个变动项对应的当前位置,执行步骤S41;S46、判断第二个变动项对应当前位置中的含义关键词是否为该含义关键词所在首例或者首行中的最后一个词,若是则执行步骤S48,若否则执行步骤S47;S47、将第二个变动项对应的当前位置沿该当前位置中的含义关键词所在的首列或者首行向后顺 序推移一个,执行步骤S41;S48、结束。S40: Obtain the first non-blank cell in the first row and the first column of the table as the current position of the two change items; S41, according to the generation rule of the question-answer pair, obtain the meaning keyword in the current location and embed the corresponding keyword into the problem template. The change position generation problem is temporarily stored; S42, obtaining the two meaning keywords of the generated problem, the numerical keyword of the row and the column intersecting the cell is temporarily stored as an answer; S43, the temporarily stored question and the answer are associated with each other; S44 And determining whether the meaning keyword in the current position corresponding to the first change item is the last word in the first row or the first column of the meaning keyword, and if yes, executing step S46, if otherwise, performing step S45; S45, first The current position corresponding to the change item is sequentially shifted one by one along the first line or the first column of the meaning keyword in the current position, and the current position corresponding to the first change item is reset, and step S41 is performed; S46, determining the second The change item corresponds to whether the meaning keyword in the current position is the last word in the first example or the first line of the meaning keyword, and if yes, step S48 is performed, if no Then, step S47 is executed; S47, the current position corresponding to the second change item is sequentially shifted one by one along the first column or the first line of the meaning keyword in the current position, and step S41 is performed; S48;
下面以构建关于每季度各险种收入的标准知识库为例,做详细说明:The following is an example of building a standard knowledge base on the income of each type of insurance in each quarter:
1、搜集关于每季度各险种的收入数据的答案,将关于每季度各险种的收入数据按表格形式整理成如图8所示的答案文件,其中首行和首列分别为两类含义关键词,分别为时间和险种(也可以为首行为险种,首列为时间),具体险种所在列和具体时间所在行交叉的单元格内为该时间该险种的收入额。2、根据前述表格中两类含义关键词确定问题模板中变动项为两项,根据语法,将问题模板确定为“两个变动项+的+一个恒定项+是+疑问词”。3、根据搜集的答案,确定恒定项为“收入”,疑问词为“多少”,得到适用于前述答案文件的更为明确的问题模板为“两个变动项+的+收入+是+多少”。4、获取表格中首行的第二个单元格和首列的第二个单元格中的两个含义关键词分别嵌入到前述问题模板中两个变动项的位置,生成问题为“第一季度意外险的收入是多少”,对应生成的答案为第二列和第二行交叉的单元格中的数值“2560000”,再将生成的问题和答案作为一对问题-答案对关联保存到标准知识库中;接着,取表格中首行的第二个单元格和首列的第三个单元格中的两个含义关键词分别嵌入到前述问题模板中两个变动项的位置,生成问题为“第一季度健康险的收入是多少”,对应生成的答案为第二列和第三行交叉的单元格中的数值“5246286”;直到获取表格中首行的第二个单元格和首列的最后一个单元格中的含义关键词为止;再取获取表格中首行的第三个单元格中的含义关键词,并依次获取首列中的各个单元格中的含义关键词,依次顺序生成问题-答案对保存到标准知识库中。1. Collect the answers to the income data of each type of insurance in each quarter, and organize the income data of each type of insurance in each quarter into an answer file as shown in Figure 8. The first row and the first column are two types of meaning keywords. , respectively, time and insurance (also can be the first behavior insurance, the first column is time), the specific insurance category and the specific time in the row of cells intersecting the income of the insurance at that time. 2. According to the two types of meaning keywords in the foregoing table, the change item in the problem template is determined as two items. According to the grammar, the question template is determined as “two variable items + one constant item + yes + question word”. 3. According to the collected answers, determine that the constant term is “income” and the question word is “how much”. The more clear question template that applies to the above answer file is “two changes + + income + yes + how much” . 4. The second cell in the first row of the table and the two meaning keywords in the second cell of the first column are respectively embedded in the position of the two variables in the aforementioned problem template, and the problem is "first quarter". What is the income of the accident insurance? The corresponding answer is the value “2560000” in the cell where the second column and the second row intersect, and then the generated question and answer are saved as a pair of questions-answer pairs to the standard knowledge. In the library; then, the second cell in the first row of the table and the two meaning keywords in the third cell of the first column are respectively embedded in the position of the two variables in the aforementioned problem template, and the problem is generated as " What is the income of health insurance in the first quarter? The corresponding answer is the value "5246286" in the cell intersecting the second and third rows; until the second cell and the first column of the first row in the table are obtained The meaning of the keyword in the last cell; then take the meaning keyword in the third cell of the first row in the table, and then get the meaning keywords in each cell in the first column, in order The problem-answer pair is saved to the standard knowledge base.
实施例三中,在实施例一的基础上,以答案文件为文本格式对实施例一中的各个步骤做了进一步地说明,具体如下:In the third embodiment, on the basis of the first embodiment, the steps in the first embodiment are further described by using the answer file as a text format, as follows:
一、构建答案文件(如图9所示)First, build an answer file (as shown in Figure 9)
S11’、搜集答案;S12’、将每个答案拆分为由若干关键词组成的词序列; S13’、保留词序列中表征答案含义的含义关键词和表征答案数值的数值关键词;S14’、将同一个词序列中的含义关键词和数值关键词按顺序排列,并在各关键词之间采用统一的关键词分隔符进行分隔;S15’、在不同词序列之间采用不同于关键词分隔符的统一的词序列分隔符进行分隔。S11', collecting answers; S12', splitting each answer into a sequence of words consisting of several keywords; S13', a meaning keyword representing the meaning of the answer in the sequence of reserved words, and a numerical keyword representing the value of the answer; S14' The meaning keywords and numerical keywords in the same word sequence are arranged in order, and are separated by a uniform keyword separator between the keywords; S15', different keywords are used between different word sequences The uniform word sequence separators of the separators are separated.
二、构建问题模板(如图10所示)Second, build a problem template (as shown in Figure 10)
S21’、对应答案文件中同一个词序列中关键词分隔符的数量确定问题模板中变动项的数量;S22’、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符合语法的问题模板。S21', the number of keyword separators in the same word sequence in the corresponding answer file determines the number of change items in the question template; S22', according to the grammar, adds a function word between the change item, the constant item and the question word to form a match A grammar question template.
三、设置恒定项和疑问词(同实施例二,此处不再赘述。)Third, set the constant term and the question word (the same as the second embodiment, no more details here.)
四、形成问题-答案对(如图11所示)Fourth, the formation of the problem - the answer pair (as shown in Figure 11)
S40’、获取文本中首个词序列分隔符的位置作为当前词序列分隔符的位置,以及首个词序列分隔符之前各关键词分隔符的位置作为各当前关键词分隔符的位置;S41’、根据问题-答案对的生成规则,获取各当前关键词分隔符之前的各含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S42’、获取当前词序列分隔符之前数值关键词作为答案暂存;S43’、将暂存的问题和答案关联保存;S44’、判断当前词序列分隔符是否为答案文件中最后一个词序列分隔符,若是则执行步骤S47’,若否则执行步骤S45’;S45’、将当前词序列分隔符的位置顺序后移并重置当前词序列分隔符的位置;S46’、各当前关键词分隔符的位置重置为当前词序列分隔符之前各关键词分隔符的位置,执行步骤S41’;S47’、结束。S40', obtaining the position of the first word sequence separator in the text as the position of the current word sequence separator, and the position of each keyword separator before the first word sequence separator as the position of each current keyword separator; S41' According to the generation rule of the question-answer pair, each meaning keyword before each current keyword separator is obtained, and the problem item generated in the problem template is temporarily stored in the problem; S42', the numerical keyword before the current word sequence separator is obtained. S43', the associated question and answer are saved; S44', determining whether the current word sequence separator is the last word sequence separator in the answer file, and if yes, executing step S47', if otherwise, performing step S45'; S45', the position of the current word sequence separator is sequentially shifted and the position of the current word sequence separator is reset; S46', the position of each current keyword separator is reset to the key before the current word sequence separator The position of the word separator is executed in step S41'; S47' and ended.
下面还是以构建关于每季度各险种收入的标准知识库为例,做详细说明:The following is a detailed description of building a standard knowledge base on the income of each type of insurance in each quarter:
1、搜集关于每季度各险种的收入数据的答案(同上例),将关于每季度各险种的收入数据按文本形式整理成如图12所示的答案文件,每个词序列之间用分号进行分隔,同一词序列中的各个关键词之间用逗号进行分隔。(也可根据习惯用其他的符号进行分隔。)2、根据前述文本中每个词序列中的含义关键词的数量确定问题模板中变动项为两项,另外根据语法,将问题模板 确定为“两个变动项+的+一个恒定项+是+疑问词”。3、根据前述搜集的答案,可以确定恒定项为“收入”,疑问词为“多少”,因此可以得到适用于前述答案文件的更为明确的问题模板为“两个变动项+的+收入+是+多少”。4、获取首个冒号的位置和该冒号之前各个逗号的位置,根据逗号的顺序依次取逗号之前的含义关键词对应嵌入到问题模板的变动项位置,生成问题为“第一季度意外险的收入是多少”,再获取所述冒号之前的数值关键词,生成答案为“2560000”,将前述生成的问题和答案作为问题-答案对关联保存在标准知识库中;然后获取第二个冒号的位置和该冒号之前各个逗号的位置,按照前述一样的规则将生成问题和答案,并将生成的问题和答案作为问题-答案对关联保存在标准知识库中;按此规则依次生成问题-答案对关联保存在标准知识库中,直至最后一个冒号为止。1. Collect the answers to the income data of each type of insurance in each quarter (the same example above), and organize the income data of each type of insurance in each quarter into an answer file as shown in Figure 12, with a semicolon between each word sequence. Separate, separating each keyword in the same word sequence with a comma. (It can also be separated by other symbols according to the habit.) 2. According to the number of meaning keywords in each word sequence in the aforementioned text, the change item in the question template is determined to be two items, and according to the grammar, the question template is determined as " The +variation + of the two variables + is + question word. 3. According to the answers collected above, it can be determined that the constant term is “income” and the question word is “how much”, so a more specific question template applicable to the aforementioned answer file can be obtained as “two changes + + income + Yes + how much." 4. Get the position of the first colon and the position of each comma before the colon. According to the order of the comma, the meaning keyword before the comma is inserted into the position of the change item of the question template, and the problem is “the income of the first quarter accident insurance”. What is it?" Then get the numeric keyword before the colon, generate the answer as "2560000", save the above generated question and answer as the question-answer pair in the standard knowledge base; then get the position of the second colon And the position of each comma before the colon, the same rules as above will generate questions and answers, and the generated questions and answers will be saved as a question-answer pair in the standard knowledge base; according to this rule, the questions are generated in turn - the answer pairs are associated Save in the standard knowledge base until the last colon.
此外,本申请一种计算机可读存储介质,该计算机可读存储介质内存储有标准知识库的构建系统20,该标准知识库的构建系统20可被一个或多个处理器执行时,实现上述标准知识库的构建方法或电子装置的操作。Furthermore, the present application is a computer readable storage medium having a built-in knowledge base of a built-in knowledge base 20 that can be implemented by one or more processors to implement the above The construction method of the standard knowledge base or the operation of the electronic device.
虽然以上描述了本申请的具体实施方式,但是本领域的技术人员应当理解,这仅是举例说明,本申请的保护范围是由所附权利要求书限定的。本领域的技术人员在不背离本申请的原理和实质的前提下,可以对这些实施方式做出多种变更或修改,但这些变更和修改均落入本申请的保护范围。While the embodiments of the present invention have been described above, it will be understood by those skilled in the art that the scope of the invention is defined by the appended claims. A person skilled in the art can make various changes or modifications to these embodiments without departing from the spirit and scope of the present application, and such changes and modifications fall within the scope of the present application.

Claims (17)

  1. 一种标准知识库的构建方法,其特征在于,包括如下步骤:A method for constructing a standard knowledge base, comprising the steps of:
    S1、构建答案文件:搜集答案,将所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;S1. Build an answer file: collect the answer, and split the answer into a unified file in a uniform format, the file including a form or a text;
    S2、构建问题模板:根据答案文件中的答案确定问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;S2. Build a problem template: determining a question template according to an answer in the answer file, the question template being a sequence of words including a plurality of change items, a plurality of constant items, and question words;
    S3、设置恒定项和疑问词:根据答案文件中的答案所表述的意思,确定问题模板中的恒定项和疑问词;S3. Setting a constant term and a question word: determining a constant term and a question word in the question template according to the meaning expressed by the answer in the answer file;
    S4、形成问题-答案对:根据问题-答案对的生成规则,获取所述答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。S4. Forming a question-answer pair: according to a generation rule of the question-answer pair, acquiring a content of a corresponding position in the answer file is embedded in a corresponding change item in the question template to generate a question, and obtaining a corresponding position in the answer file The content is generated to generate an answer, and the generated question and the answer are associated as a question-answer pair.
  2. 根据权利要求1所述的标准知识库的构建方法,其特征在于,当步骤S1中所述文件为表格时,包括以下分步骤:The method for constructing a standard knowledge base according to claim 1, wherein when the file in step S1 is a table, the following sub-steps are included:
    S11、搜集答案;S11, collecting answers;
    S12、将每个答案拆分为由若干关键词组成的词序列;S12. Split each answer into a sequence of words consisting of several keywords;
    S13、获取每个词序列中表征答案含义的两个含义关键词;S13. Obtain two meaning keywords in the sequence of each word that represent the meaning of the answer;
    S14、将含义关键词去重并分类;S14, de-duplicating and classifying the meaning keywords;
    S15、将一类含义关键词作为表格的首行,另一类含义关键词作为表格的首列,且首行和首列的交叉单元格空白;S15. A class of meaning keywords is used as the first row of the table, another type of meaning keyword is used as the first column of the table, and the intersecting cells of the first row and the first column are blank;
    S16、获取词序列中表征答案数值的数值关键词;S16. Obtain a numerical keyword representing a value of the answer in the sequence of words;
    S17、将所述数值关键词填写在该数值关键词所在词序列中的两个含义关键词所在行和列交叉的单元格内。S17. Fill in the numerical keyword in the cell where the two meaning keywords in the sequence of words of the numerical keyword are located and the cells intersecting the column.
  3. 根据权利要求2所述的标准知识库的构建方法,其特征在于,步骤S2具体包括以下分步骤:The method for constructing a standard knowledge base according to claim 2, wherein the step S2 comprises the following sub-steps:
    S21、对应答案文件中含义关键词的种类确定问题模板中变动项的数量;S21. The type of the meaning keyword in the corresponding answer file determines the number of the change items in the problem template;
    S22、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符合语法的问题模板。S22. Add a function word between the variable item, the constant item, and the question word according to the grammar to form a grammatical problem template.
  4. 根据权利要求2所述的标准知识库的构建方法,其特征在于,步骤S4具体包括以下分步骤:The method for constructing a standard knowledge base according to claim 2, wherein the step S4 specifically comprises the following sub-steps:
    S40、获取表格中首行和首列中首个非空白单元格作为两个变动项的当前位置;S40. Obtain the first non-blank cell in the first row and the first column of the table as the current position of the two variables;
    S41、根据问题-答案对的生成规则,获取当前位置中的含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S41. According to the generation rule of the question-answer pair, obtaining the meaning keyword in the current location and temporarily generating the problem corresponding to the location of the change item embedded in the problem template;
    S42、获取生成问题的两个含义关键词所在行和列交叉的单元格中数值关键词作为答案暂存;S42. Obtaining two meaning keywords of the generated problem: the numerical keyword in the cell where the row and the column intersecting are temporarily stored as an answer;
    S43、将暂存的问题和答案关联保存;S43. Associate the temporarily stored question and the answer;
    S44、判断首个变动项对应的当前位置中的含义关键词是否为该含义关键词所在首行或者首列中的最后一个词,若是则执行步骤S46,若否则执行步骤S45;S44, determining whether the meaning keyword in the current position corresponding to the first change item is the last word in the first row or the first column of the meaning keyword, if yes, executing step S46, if otherwise, executing step S45;
    S45、将首个变动项对应的当前位置沿该当前位置中的含义关键词所在的首行或者首列向后顺序推移一个,并重置首个变动项对应的当前位置,执行步骤S41;S45, the current position corresponding to the first change item is sequentially shifted along the first row or the first column of the meaning keyword in the current position, and the current position corresponding to the first change item is reset, and step S41 is performed;
    S46、判断第二个变动项对应当前位置中的含义关键词是否为该含义关键词所在首例或者首行中的最后一个词,若是则执行步骤S48,若否则执行步骤S47;S46, determining whether the second change item corresponds to the meaning keyword in the current position is the first word in the first instance or the first line of the meaning keyword, if yes, step S48 is performed, otherwise step S47 is performed;
    S47、将第二个变动项对应的当前位置沿该当前位置中的含义关键词所在的首列或者首行向后顺序推移一个,执行步骤S41;S47, the current position corresponding to the second change item is sequentially shifted along the first column or the first line of the meaning keyword in the current position, and step S41 is performed;
    S48、结束。S48, the end.
  5. 根据权利要求1所述的标准知识库的构建方法,其特征在于,当步骤S1中所述文件为文本时,包括以下分步骤:The method for constructing a standard knowledge base according to claim 1, wherein when the file in step S1 is text, the following sub-steps are included:
    S11’、搜集答案;S11’, collecting answers;
    S12’、将每个答案拆分为由若干关键词组成的词序列;S12', splitting each answer into a sequence of words consisting of several keywords;
    S13’、保留词序列中表征答案含义的含义关键词和表征答案数值的数值关键词;S13', a meaning keyword indicating a meaning of the answer in the sequence of reserved words, and a numerical keyword representing the value of the answer;
    S14’、将同一个词序列中的含义关键词和数值关键词按顺序排列,并在各关键词之间采用统一的关键词分隔符进行分隔;S14', the meaning keywords and the numerical keywords in the same word sequence are arranged in order, and are separated by a uniform keyword separator between the keywords;
    S15’、在不同词序列之间采用不同于关键词分隔符的统一的词序列分隔符进行分隔。S15' is separated by a uniform word sequence separator different from the keyword separator between different word sequences.
  6. 根据权利要求5所述的标准知识库的构建方法,其特征在于,步骤S2具体包括以下分步骤:The method for constructing a standard knowledge base according to claim 5, wherein the step S2 comprises the following sub-steps:
    S21’、对应答案文件中同一个词序列中关键词分隔符的数量确定问题模板中变动项的数量;S21', the number of keyword separators in the same word sequence in the corresponding answer file determines the number of change items in the question template;
    S22’、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符合语法的问题模板。S22', according to the grammar, adding a virtual word between the variable item, the constant item, and the question word to form a grammatical problem template.
  7. 根据权利要求5所述的标准知识库的构建方法,其特征在于,步骤S4包括以下分步骤:The method for constructing a standard knowledge base according to claim 5, wherein the step S4 comprises the following substeps:
    S40’、获取文本中首个词序列分隔符的位置作为当前词序列分隔符的位置,以及首个词序列分隔符之前各关键词分隔符的位置作为各当前关键词分隔符的位置;S40', obtaining the position of the first word sequence separator in the text as the position of the current word sequence separator, and the position of each keyword separator before the first word sequence separator as the position of each current keyword separator;
    S41’、根据问题-答案对的生成规则,获取各当前关键词分隔符之前的各含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S41', according to the generation rule of the question-answer pair, obtaining each of the meaning keywords before each current keyword separator, and temporarily storing the change item position problem embedded in the problem template;
    S42’、获取当前词序列分隔符之前数值关键词作为答案暂存;S42', before the current word sequence separator is obtained, the numerical keyword is temporarily stored as an answer;
    S43’、将暂存的问题和答案关联保存;S43’, saving the temporarily stored question and answer;
    S44’、判断当前词序列分隔符是否为答案文件中最后一个词序列分隔符,若是则执行步骤S47’,若否则执行步骤S45’;S44', determining whether the current word sequence separator is the last word sequence separator in the answer file, and if so, executing step S47', if otherwise performing step S45';
    S45’、将当前词序列分隔符的位置顺序后移并重置当前词序列分隔符的 位置;S45', moving the position of the current word sequence separator back and resetting the position of the current word sequence separator;
    S46’、各当前关键词分隔符的位置重置为当前词序列分隔符之前各关键词分隔符的位置,执行步骤S41’;S46', the position of each current keyword separator is reset to the position of each keyword separator before the current word sequence separator, and step S41' is performed;
    S47’、结束。S47', the end.
  8. 根据权利要求1-7中任一项所述的标准知识库的构建方法,其特征在于,步骤S3具体包括以下分步骤:The method for constructing a standard knowledge base according to any one of claims 1 to 7, wherein the step S3 specifically comprises the following sub-steps:
    S31、将每个答案拆分为由若干关键词组成的词序列;S31. Split each answer into a sequence of words consisting of several keywords;
    S32、获取词序列中表征答案含义的含义关键词中的抽象名词作为恒定项;S32. Obtain an abstract noun in the meaning keyword that represents the meaning of the answer in the sequence of words as a constant term;
    S33、在恒定项之后设置一个适合于询问答案文件中数值关键词的疑问词。S33. After the constant item, set a question word suitable for asking a numerical keyword in the answer file.
  9. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器上存储有可被所述处理器执行的标准知识库的构建系统,所述标准知识库的构建系统包括:An electronic device includes a memory and a processor, wherein the memory stores a built-in system of a standard knowledge base executable by the processor, and the built-in system of the standard knowledge base includes:
    文件接收模块,用于接收整理好的答案文件,所述答案文件中包含有至少一个答案,所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;a file receiving module, configured to receive the prepared answer file, where the answer file includes at least one answer, and the answers are split into a unified file in a unified format, and the file includes a form or a text;
    模板设置模块,用于根据答案文件中的答案设置问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;a template setting module, configured to set a question template according to an answer in the answer file, where the question template is a word sequence including a plurality of change items, a plurality of constant items, and question words;
    输入模块,用于接收恒定项和疑问词的内容;An input module for receiving the contents of the constant item and the question word;
    问题-答案对生成模块,用于根据问题-答案对的生成规则将接收到的答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。a question-answer pair generation module, configured to embed a content of a corresponding position in the received answer file into a corresponding change item in the question template according to a generation rule of the question-answer pair to generate a question, and obtain a corresponding position in the answer file The content is generated to generate an answer, and the generated question and the answer are associated as a question-answer pair.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有标准知识库的构建系统,所述标准知识库的构建系统可被至少一个 处理器所执行,以实现以下步骤:A computer readable storage medium, wherein the computer readable storage medium stores a build system of a standard knowledge base, and the build system of the standard knowledge base can be executed by at least one processor to implement the following steps :
    S1、构建答案文件:搜集答案,将所述答案按统一格式拆分后整理到同一个文件中,所述文件包括表格或者文本;S1. Build an answer file: collect the answer, and split the answer into a unified file in a uniform format, the file including a form or a text;
    S2、构建问题模板:根据答案文件中的答案确定问题模板,所述问题模板为包括有若干变动项、若干恒定项和疑问词的词序列;S2. Build a problem template: determining a question template according to an answer in the answer file, the question template being a sequence of words including a plurality of change items, a plurality of constant items, and question words;
    S3、设置恒定项和疑问词:根据答案文件中的答案所表述的意思,确定问题模板中的恒定项和疑问词;S3. Setting a constant term and a question word: determining a constant term and a question word in the question template according to the meaning expressed by the answer in the answer file;
    S4、形成问题-答案对:根据问题-答案对的生成规则,获取所述答案文件中相应位置的内容嵌入到所述问题模板中的相应变动项中生成问题,同时获取答案文件中相应位置的内容以生成答案,将生成的所述问题和所述答案关联保存为问题-答案对。S4. Forming a question-answer pair: according to a generation rule of the question-answer pair, acquiring a content of a corresponding position in the answer file is embedded in a corresponding change item in the question template to generate a question, and obtaining a corresponding position in the answer file The content is generated to generate an answer, and the generated question and the answer are associated as a question-answer pair.
  11. 根据权利要求10所述的计算机可读存储介质,其特征在于,当步骤S1中所述文件为表格时,包括以下分步骤:The computer readable storage medium according to claim 10, wherein when the file in step S1 is a table, the following substeps are included:
    S11、搜集答案;S11, collecting answers;
    S12、将每个答案拆分为由若干关键词组成的词序列;S12. Split each answer into a sequence of words consisting of several keywords;
    S13、获取每个词序列中表征答案含义的两个含义关键词;S13. Obtain two meaning keywords in the sequence of each word that represent the meaning of the answer;
    S14、将含义关键词去重并分类;S14, de-duplicating and classifying the meaning keywords;
    S15、将一类含义关键词作为表格的首行,另一类含义关键词作为表格的首列,且首行和首列的交叉单元格空白;S15. A class of meaning keywords is used as the first row of the table, another type of meaning keyword is used as the first column of the table, and the intersecting cells of the first row and the first column are blank;
    S16、获取词序列中表征答案数值的数值关键词;S16. Obtain a numerical keyword representing a value of the answer in the sequence of words;
    S17、将所述数值关键词填写在该数值关键词所在词序列中的两个含义关键词所在行和列交叉的单元格内。S17. Fill in the numerical keyword in the cell where the two meaning keywords in the sequence of words of the numerical keyword are located and the cells intersecting the column.
  12. 根据权利要求11所述的计算机可读存储介质,其特征在于,步骤S2具体包括以下分步骤:The computer readable storage medium according to claim 11, wherein the step S2 comprises the following substeps:
    S21、对应答案文件中含义关键词的种类确定问题模板中变动项的数量;S21. The type of the meaning keyword in the corresponding answer file determines the number of the change items in the problem template;
    S22、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符 合语法的问题模板。S22. Add a function word between the variable item, the constant item, and the question word according to the grammar to form a problem template that conforms to the grammar.
  13. 根据权利要求11所述的计算机可读存储介质,其特征在于,步骤S4具体包括以下分步骤:The computer readable storage medium according to claim 11, wherein the step S4 comprises the following substeps:
    S40、获取表格中首行和首列中首个非空白单元格作为两个变动项的当前位置;S40. Obtain the first non-blank cell in the first row and the first column of the table as the current position of the two variables;
    S41、根据问题-答案对的生成规则,获取当前位置中的含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S41. According to the generation rule of the question-answer pair, obtaining the meaning keyword in the current location and temporarily generating the problem corresponding to the location of the change item embedded in the problem template;
    S42、获取生成问题的两个含义关键词所在行和列交叉的单元格中数值关键词作为答案暂存;S42. Obtaining two meaning keywords of the generated problem: the numerical keyword in the cell where the row and the column intersecting are temporarily stored as an answer;
    S43、将暂存的问题和答案关联保存;S43. Associate the temporarily stored question and the answer;
    S44、判断首个变动项对应的当前位置中的含义关键词是否为该含义关键词所在首行或者首列中的最后一个词,若是则执行步骤S46,若否则执行步骤S45;S44, determining whether the meaning keyword in the current position corresponding to the first change item is the last word in the first row or the first column of the meaning keyword, if yes, executing step S46, if otherwise, executing step S45;
    S45、将首个变动项对应的当前位置沿该当前位置中的含义关键词所在的首行或者首列向后顺序推移一个,并重置首个变动项对应的当前位置,执行步骤S41;S45, the current position corresponding to the first change item is sequentially shifted along the first row or the first column of the meaning keyword in the current position, and the current position corresponding to the first change item is reset, and step S41 is performed;
    S46、判断第二个变动项对应当前位置中的含义关键词是否为该含义关键词所在首例或者首行中的最后一个词,若是则执行步骤S48,若否则执行步骤S47;S46, determining whether the second change item corresponds to the meaning keyword in the current position is the first word in the first instance or the first line of the meaning keyword, if yes, step S48 is performed, otherwise step S47 is performed;
    S47、将第二个变动项对应的当前位置沿该当前位置中的含义关键词所在的首列或者首行向后顺序推移一个,执行步骤S41;S47, the current position corresponding to the second change item is sequentially shifted along the first column or the first line of the meaning keyword in the current position, and step S41 is performed;
    S48、结束。S48, the end.
  14. 根据权利要求10所述的计算机可读存储介质,其特征在于,当步骤S1中所述文件为文本时,包括以下分步骤:The computer readable storage medium according to claim 10, wherein when the file in step S1 is text, the following substeps are included:
    S11’、搜集答案;S11’, collecting answers;
    S12’、将每个答案拆分为由若干关键词组成的词序列;S12', splitting each answer into a sequence of words consisting of several keywords;
    S13’、保留词序列中表征答案含义的含义关键词和表征答案数值的数值关键词;S13', a meaning keyword indicating a meaning of the answer in the sequence of reserved words, and a numerical keyword representing the value of the answer;
    S14’、将同一个词序列中的含义关键词和数值关键词按顺序排列,并在各关键词之间采用统一的关键词分隔符进行分隔;S14', the meaning keywords and the numerical keywords in the same word sequence are arranged in order, and are separated by a uniform keyword separator between the keywords;
    S15’、在不同词序列之间采用不同于关键词分隔符的统一的词序列分隔符进行分隔。S15' is separated by a uniform word sequence separator different from the keyword separator between different word sequences.
  15. 根据权利要求14所述的计算机可读存储介质,其特征在于,步骤S2具体包括以下分步骤:The computer readable storage medium according to claim 14, wherein the step S2 comprises the following substeps:
    S21’、对应答案文件中同一个词序列中关键词分隔符的数量确定问题模板中变动项的数量;S21', the number of keyword separators in the same word sequence in the corresponding answer file determines the number of change items in the question template;
    S22’、根据语法,在变动项、恒定项和疑问词之间加入虚词,以形成符合语法的问题模板。S22', according to the grammar, adding a virtual word between the variable item, the constant item, and the question word to form a grammatical problem template.
  16. 根据权利要求14所述的计算机可读存储介质,其特征在于,步骤S4包括以下分步骤:The computer readable storage medium of claim 14, wherein step S4 comprises the following substeps:
    S40’、获取文本中首个词序列分隔符的位置作为当前词序列分隔符的位置,以及首个词序列分隔符之前各关键词分隔符的位置作为各当前关键词分隔符的位置;S40', obtaining the position of the first word sequence separator in the text as the position of the current word sequence separator, and the position of each keyword separator before the first word sequence separator as the position of each current keyword separator;
    S41’、根据问题-答案对的生成规则,获取各当前关键词分隔符之前的各含义关键词对应嵌入到问题模板的变动项位置生成问题暂存;S41', according to the generation rule of the question-answer pair, obtaining each of the meaning keywords before each current keyword separator, and temporarily storing the change item position problem embedded in the problem template;
    S42’、获取当前词序列分隔符之前数值关键词作为答案暂存;S42', before the current word sequence separator is obtained, the numerical keyword is temporarily stored as an answer;
    S43’、将暂存的问题和答案关联保存;S43’, saving the temporarily stored question and answer;
    S44’、判断当前词序列分隔符是否为答案文件中最后一个词序列分隔符,若是则执行步骤S47’,若否则执行步骤S45’;S44', determining whether the current word sequence separator is the last word sequence separator in the answer file, and if so, executing step S47', if otherwise performing step S45';
    S45’、将当前词序列分隔符的位置顺序后移并重置当前词序列分隔符的位置;S45', moving the position of the current word sequence separator back and resetting the position of the current word sequence separator;
    S46’、各当前关键词分隔符的位置重置为当前词序列分隔符之前各关键 词分隔符的位置,执行步骤S41’;S46', the position of each current keyword separator is reset to the position of each key word separator before the current word sequence separator, and step S41' is performed;
    S47’、结束。S47', the end.
  17. 根据权利要求10-16中任一项所述的计算机可读存储介质,其特征在于,步骤S3具体包括以下分步骤:The computer readable storage medium according to any one of claims 10 to 16, wherein the step S3 comprises the following substeps:
    S31、将每个答案拆分为由若干关键词组成的词序列;S31. Split each answer into a sequence of words consisting of several keywords;
    S32、获取词序列中表征答案含义的含义关键词中的抽象名词作为恒定项;S32. Obtain an abstract noun in the meaning keyword that represents the meaning of the answer in the sequence of words as a constant term;
    S33、在恒定项之后设置一个适合于询问答案文件中数值关键词的疑问词。S33. After the constant item, set a question word suitable for asking a numerical keyword in the answer file.
PCT/CN2018/076484 2017-10-26 2018-02-12 Method for building standard knowledge base, electronic device, and storage medium WO2019080419A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711031785.9 2017-10-26
CN201711031785.9A CN107832374A (en) 2017-10-26 2017-10-26 Construction method, electronic installation and the storage medium in standard knowledge storehouse

Publications (1)

Publication Number Publication Date
WO2019080419A1 true WO2019080419A1 (en) 2019-05-02

Family

ID=61650999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076484 WO2019080419A1 (en) 2017-10-26 2018-02-12 Method for building standard knowledge base, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN107832374A (en)
WO (1) WO2019080419A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710747B (en) * 2019-01-16 2021-04-06 北京猎户星空科技有限公司 Information processing method and device and electronic equipment
CN110334197A (en) * 2019-06-28 2019-10-15 科大讯飞股份有限公司 Corpus processing method and relevant apparatus
CN112328762B (en) * 2020-11-04 2023-12-19 平安科技(深圳)有限公司 Question-answer corpus generation method and device based on text generation model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261690A (en) * 2008-04-18 2008-09-10 北京百问百答网络技术有限公司 A system and method for automatic problem generation
CN104978396A (en) * 2015-06-02 2015-10-14 百度在线网络技术(北京)有限公司 Knowledge database based question and answer generating method and apparatus
CN107220296A (en) * 2017-04-28 2017-09-29 北京拓尔思信息技术股份有限公司 The generation method of question and answer knowledge base, the training method of neutral net and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366621B2 (en) * 2014-08-26 2019-07-30 Microsoft Technology Licensing, Llc Generating high-level questions from sentences
CN104933097B (en) * 2015-05-27 2019-04-16 百度在线网络技术(北京)有限公司 A kind of data processing method and device for retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261690A (en) * 2008-04-18 2008-09-10 北京百问百答网络技术有限公司 A system and method for automatic problem generation
CN104978396A (en) * 2015-06-02 2015-10-14 百度在线网络技术(北京)有限公司 Knowledge database based question and answer generating method and apparatus
CN107220296A (en) * 2017-04-28 2017-09-29 北京拓尔思信息技术股份有限公司 The generation method of question and answer knowledge base, the training method of neutral net and equipment

Also Published As

Publication number Publication date
CN107832374A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
US10621281B2 (en) Populating values in a spreadsheet using semantic cues
WO2020186786A1 (en) File processing method and apparatus, computer device and storage medium
WO2019062001A1 (en) Intelligent robotic customer service method, electronic device and computer readable storage medium
CN110292775B (en) Method and device for acquiring difference data
WO2019076062A1 (en) Function page customization method and application server
WO2019062010A1 (en) Semantic recognition method, electronic device and computer readable storage medium
CN1664810A (en) Assisted form filling
US11321361B2 (en) Genealogical entity resolution system and method
US10748166B2 (en) Method and system for mining churn factor causing user churn for network application
WO2019062078A1 (en) Smart customer service method, electronic apparatus and computer-readable storage medium
WO2019080420A1 (en) Method for customer service of human-robot collaboration, electronic device, and storage medium
WO2019085463A1 (en) Department demand recommendation method, application server, and computer-readable storage medium
CN112286934A (en) Database table importing method, device, equipment and medium
WO2019080419A1 (en) Method for building standard knowledge base, electronic device, and storage medium
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN104516635A (en) Content display management
WO2021169626A1 (en) Word library-based matching recommendation method, apparatus, device, and storage medium
US20150379112A1 (en) Creating an on-line job function ontology
CN114528413B (en) Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking
CN115391439A (en) Document data export method, device, electronic equipment and storage medium
CN111475494A (en) Mass data processing method, system, terminal and storage medium
CN110737432A (en) script aided design method and device based on root list
CN106649210A (en) Data conversion method and device
CN113220951B (en) Medical clinic support method and system based on intelligent content
US20200202233A1 (en) Future scenario generating device and method, and computer program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18870348

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18870348

Country of ref document: EP

Kind code of ref document: A1