CN110532265B

CN110532265B - Method and device for constructing question-answering system based on product instruction manual and computing equipment

Info

Publication number: CN110532265B
Application number: CN201910766891.4A
Authority: CN
Inventors: 翟羽佳; 梁霄; 石智中
Original assignee: Beijing Cheerbright Technologies Co Ltd
Current assignee: Beijing Cheerbright Technologies Co Ltd
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2022-03-18
Anticipated expiration: 2039-08-20
Also published as: CN110532265A

Abstract

The invention discloses a method, a device and computing equipment for constructing a question-answering system based on a product instruction manual, wherein the method comprises the following steps: building a knowledge base according to a product use manual, wherein each data entry of the knowledge base comprises an association relation between a knowledge point and a product, a component and a label set, the label set comprises one or more semantic labels, and the semantic labels represent operation/description information related to the component; constructing a problem template library according to a historical problem library, wherein each data entry of the problem template library comprises an incidence relation between a problem template and a label set; and constructing the knowledge base and the question template base into a question-answering system so as to match the user question with the question template base when the user question is received, obtain a label set associated with the user question, and further query a corresponding knowledge point from the knowledge base according to a product, a component and the label set associated with the user question as an answer corresponding to the user question.

Description

Method and device for constructing question-answering system based on product instruction manual and computing equipment

Technical Field

The invention relates to the field of data processing, in particular to a method and a device for constructing a question-answering system based on a product instruction manual and computing equipment.

Background

Product instruction manuals (e.g., automotive user manuals) typically contain knowledge of the flow of use, maintenance, and general problem handling methods for a product (e.g., a certain model of automobile). The familiarity with the product use manual can avoid many common knowledge errors and is greatly helpful to the use and maintenance of the product. With the development of artificial intelligence technology in recent years, a question-answering system in the use aspect of products can be constructed by means of various intelligent terminals, various problems encountered by people in the use of products are conveniently and quickly solved in a natural language interaction mode, and very important application value is shown. The product use manual is a rich and authoritative knowledge source provided by manufacturers, has the characteristics of non-structuralization, complex content and the like, and how to construct a question-answering system around the product use manual knowledge is also a very challenging problem.

Currently, a question-answering system is constructed around a product use manual, and two ideas are mainly adopted. One idea is to directly search the most matched answers from unstructured texts such as product instruction manuals according to the questions based on the question and answer similarity matching mode. The method generally comprises two stages: the first stage is coarse ranking, namely a candidate answer set is obtained through a keyword retrieval mode similar to a search engine; and the second stage of fine ranking, namely performing semantic representation (a neural network method can be used) on the questions and the candidate answers, calculating the semantic similarity of the questions and the candidate answers, and performing ranking again to finally obtain the answers of the questions.

The scheme has the defects of relatively poor accuracy, uneven quality of obtained answers and difficult control. On one hand, for efficiency reasons, the rough ranking stage is based on keyword retrieval, and although the rough ranking stage comprises operations such as query expansion and the like, the comprehensive expressions such as synonyms, near synonyms and related words are still difficult to be completely captured, so that the retrieved candidate answer set is inaccurate; on the other hand, the key of the fine ranking stage is to perform semantic representation on the question and the candidate answer, and at present, a deep neural network method is mostly adopted, but the method is limited by the inconsistency of semantic spaces of the question and the answer, the interaction of the question and the answer needs to be considered in model design, and meanwhile, a large training data set is also needed for model training, so that great difficulty is caused to model training, and finally the quality of the obtained answer is uncontrollable.

The other idea is based on the way of matching the similarity of the questions and the questions, firstly, the contents of the product instruction manual are disassembled, potential question-answer pairs are constructed, a question-answer library is formed, then, the question with the most similar semantics to the user question is retrieved from the question-answer library, and the answer of the question is used as the final answer of the user question and is returned to the user. The main disadvantages of the scheme are that the cost for constructing the question answer library with high accuracy and coverage rate is relatively high, and the range of the user questions capable of being answered is very limited. Because the content of the automobile user manual is mainly unstructured text, the images and texts are arranged in a mixed mode, the forms are rich, the accuracy of extracting relevant knowledge is low, and user problems are often unpredictable, so that a perfect problem answer library is difficult to construct at one time.

Disclosure of Invention

In view of the above problems, the present invention has been developed to provide a method, apparatus, and computing device for building a question-answering system based on a product instruction manual that overcomes or at least partially solves the above problems.

According to one aspect of the invention, a method for constructing a question-answering system based on product instruction manuals is provided and executed in a computing device, the product instruction manuals are organized in catalogues, each bottom catalogue corresponds to a knowledge point, and each knowledge point comprises a knowledge point title and knowledge point content, and the method comprises the following steps:

building a knowledge base according to a product use manual, wherein each data entry of the knowledge base comprises an association relation between a knowledge point and a product, a component and a label set, the label set comprises one or more semantic labels, and the semantic labels represent operation/description information related to the component;

constructing a problem template library according to a historical problem library, wherein each data entry of the problem template library comprises an incidence relation between a problem template and a label set; and

and constructing the knowledge base and the question template base into a question-answering system so as to match the user question with the question template base when the user question is received, obtain a label set associated with the user question, and further query a corresponding knowledge point from the knowledge base according to a product, a component and the label set associated with the user question as an answer corresponding to the user question.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the constructing a knowledge base according to the product instruction manual comprises: determining a component associated with the knowledge point according to the knowledge point title; determining a knowledge point set associated with the component according to the association relationship between the knowledge point and the component; determining a tag set associated with the component according to the knowledge point set associated with the component; for each knowledge point: acquiring a label set associated with the part associated with the knowledge point; matching the knowledge point title with the acquired label set, and taking one or more matched semantic labels as a label set associated with the knowledge point; the knowledge point and the set of products, components and tags associated with the knowledge point are added to the knowledge base as a data entry.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the determining the parts associated with the knowledge points according to the knowledge point titles includes: if the keyword of a certain part appears in the knowledge point title, the part is determined as the part associated with the knowledge point.

Alternatively, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the keywords of the component include: keywords associated with the name of the part and/or keywords that operate/describe the part.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the determining a set of tags associated with a component according to a set of knowledge points associated with the component includes: for each part: traversing each knowledge point in the set of knowledge points associated with the part; for each traversed knowledge point, extracting a core word representing the operation/description of the part from the knowledge point title of the knowledge point; and summarizing the core words corresponding to all the traversed knowledge points to obtain a label set associated with the part.

Optionally, the method for constructing a question-answering system based on the product instruction manual according to the invention, wherein each semantic tag in the tag set comprises one or more synonyms.

Optionally, according to the method for constructing a question-answering system based on the product instruction manual of the present invention, the knowledge base is a knowledge graph, wherein the product and the knowledge point correspond to the nodes in the knowledge graph, and the component and the label set correspond to the nodes.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the constructing a question template library according to a historical question library, includes: screening out problems related to a product operation manual from a historical problem library to generate a candidate problem library; for each question in the library of candidate questions: performing product entity recognition, component recognition and core word recognition on the problem, if the recognized core word is a semantic label in a label set associated with the component or a synonym of the semantic label, adding the semantic label into the label set of the problem, and replacing the product entity in the problem with the type of the product entity to obtain a problem template of the problem; and aggregating all the problem templates to generate a problem template library.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the step of screening the questions related to the product instruction manual from the historical question bank comprises: and matching the questions in the historical question bank by using the part keywords so as to screen out the questions related to the product instruction manual.

Optionally, according to the method for constructing a question-answering system based on a product instruction manual of the present invention, if the identified core word is not a semantic tag in the tag set associated with the component, or is not a synonym of the semantic tag, the semantic tag in the tag set associated with the component is used to match the question, and the matched semantic tag is added to the tag set of the question.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the aggregating all question templates to generate a question template library includes: carrying out duplicate removal treatment on the same problem template; and merging the similar problem templates, wherein the label set associated with the problem templates obtained by merging is as follows: the similar problem templates are respectively associated with a union set of label sets; and adding each problem template obtained by the deduplication processing and the combination processing and the associated label set thereof into the problem template library as a data entry.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the matching the user question with the question template library to obtain a tag set associated with the user question includes: product entity identification and component identification are carried out on the user problems, and products and components related to the user problems are obtained; replacing the product entity in the user problem with the type of the product entity to obtain a generalized user problem; matching the generalized user problems with the problem template library to obtain problem templates corresponding to the user problems; and acquiring a label set associated with the question template from the question template library as a label set associated with the user question.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the matching of the generalized user questions with the question template library to obtain question templates corresponding to the user questions includes: if the question template is matched in the question template library, taking the matched question template as a question template corresponding to the user question; and if the problem template is not matched in the problem template library, calculating the similarity between the user problem and each problem template in the problem template library, and taking the problem template with the highest similarity as the problem template corresponding to the user problem.

Optionally, according to the method for constructing a question-answering system based on the product instruction manual of the present invention, the similarity is: the editing distance similarity between the user question and the question template; vector similarity between the user question and the question template; or, a weighted average of the edit distance similarity and the vector similarity.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the querying the corresponding knowledge points from the knowledge base includes: acquiring a candidate knowledge point set from the knowledge base according to products and components related to user problems; calculating a tag set associated with the user question and a tag set associated with the knowledge point in the candidate knowledge point set, wherein first matching scores of the tag set associated with the user question and the tag set associated with the knowledge point in the candidate knowledge point set are used as matching scores of the user question and the knowledge point; and acquiring the knowledge points with the matching scores larger than a preset threshold value as answers corresponding to the user questions.

Optionally, according to the method for constructing a question-answering system based on the product instruction manual of the present invention, the first matching score is: a first ratio of an intersection of a tag set associated with the user problem and a tag set associated with the knowledge point to a union of the two; vector similarity of a label set associated with the user question and a label set associated with the knowledge point; or, the first ratio and a weighted average of vector similarities.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the querying the corresponding knowledge points from the knowledge base further includes: calculating the vector similarity between the user question and the title of the knowledge point as a second matching score; and taking the weighted average of the first matching score and the second matching score as the matching score of the user question and the knowledge point.

Optionally, the method for constructing a question-answering system based on a product instruction manual according to the present invention, wherein the obtaining of the knowledge point with the matching score larger than the predetermined threshold as the answer corresponding to the user question includes: and if a plurality of knowledge points with matching scores larger than a preset threshold exist, performing directory inspection on the knowledge points, aggregating the knowledge points belonging to the same directory into an aggregated knowledge point, and taking the preset number of knowledge points with the highest matching scores as answers corresponding to the user questions.

Alternatively, according to the method for constructing the question-answering system based on the product instruction manual of the present invention, the matching score of the aggregated knowledge points is the highest matching score among the knowledge points before aggregation.

Optionally, the method for constructing the question-answering system based on the product instruction manual is the automobile user instruction manual.

According to another aspect of the present invention, there is provided an apparatus for constructing a question-answering system based on product instruction manuals, residing in a computing device, the product instruction manuals being organized in catalogues, each of the lowest catalogues corresponding to a knowledge point, each knowledge point including a knowledge point title and a knowledge point content, the apparatus comprising:

a knowledge base construction unit, which is suitable for constructing a knowledge base according to a product use manual, wherein each data item of the knowledge base comprises an incidence relation between a knowledge point and a product, a component and a label set, and the label set comprises one or more semantic labels which represent operation/description information related to the component;

the problem template library construction unit is suitable for constructing a problem template library according to a historical problem library, and each data entry of the problem template library comprises an incidence relation between a problem template and a label set;

and the question-answering system construction unit is suitable for constructing the knowledge base and the question template base into a question-answering system so as to match the user question with the question template base when receiving the user question, acquire a tag set associated with the user question, and further query a corresponding knowledge point from the knowledge base according to a product, a component and the tag set associated with the user question to serve as an answer corresponding to the user question.

According to yet another aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the above-described method.

According to yet another aspect of the present invention, a readable storage medium stores program instructions that, when read and executed by a computing device, cause the computing device to perform the above-described method.

According to the scheme for constructing the question-answering system based on the product instruction manual, firstly, an unstructured product instruction manual is constructed into a structured knowledge base, and each data entry of the knowledge base comprises an incidence relation between a knowledge point and a product, a part and a semantic label; then, collecting a user historical question library related to a product use manual, analyzing entities, parts and corresponding label sets of questions in the question library, building a question template library by generalizing the questions into templates, wherein each question template in the question template library is associated with a corresponding label set; the question-answering system constructed based on the knowledge base and the question template base can perform semantic analysis on user questions on line, determine products, components and semantic label information (semantic labels are obtained by matching with the question template base) related to the user questions, retrieve and infer related knowledge points in the knowledge base, sort by integrating the semantic label matching degree of the user questions and the knowledge points and/or the similarity of knowledge point titles and the user questions, and finally obtain question answers. The scheme fully utilizes the strong knowledge expression and reasoning capability of the knowledge base, and can obtain more accurate and semantically related answers to the questions.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic catalog diagram of an automotive user manual in an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating knowledge points of an automotive user manual in an embodiment of the present invention;

FIG. 3 shows a schematic diagram of a knowledge-graph of the automotive field in an embodiment of the invention;

FIG. 4 shows a block diagram of a computing device 400, according to one embodiment of the invention;

FIG. 5 illustrates a flow diagram of a method 500 for building a question-answering system based on a product instruction manual, according to one embodiment of the present invention;

FIG. 6 illustrates a flow chart for building a knowledge base from product instruction manuals in a method 500;

FIG. 7 is a flow diagram illustrating the construction of a question template library from a historical question library in method 500;

FIG. 8 is a flow chart illustrating a question answering system according to an embodiment of the present invention;

fig. 9 shows a block diagram of an apparatus 900 for building a question-answering system based on a product instruction manual according to one embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a method for constructing a question-answering system based on a product instruction manual. The product instruction manual is an unstructured text which is usually organized in a directory, each bottom directory corresponds to a knowledge point, and each knowledge point comprises a knowledge point title and knowledge point contents.

As shown in fig. 1, the contents of the automobile user manual are organized according to categories, wherein the contents contained in the lowest category are called a knowledge point, and are generally the description or operability contents about the automobile. The knowledge points are composed of knowledge point titles and knowledge point content, such as the automotive user manual of audi a4, with one knowledge point entitled "manually adjusted front seat" under the directory "seat and set/front seat" and knowledge point content under the title, giving 6 steps to manually adjust the front seat (as shown in fig. 2).

The method for constructing the question-answering system based on the product instruction manual can be executed in computing equipment. FIG. 4 shows a block diagram of a computing device 400, according to one embodiment of the invention. As shown in FIG. 4, in a basic configuration 402, a computing device 400 typically includes a system memory 406 and one or more processors 404. A memory bus 408 may be used for communicating between the processor 404 and the system memory 406.

Depending on the desired configuration, processor 404 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. Processor 404 may include one or more levels of cache, such as a level one cache 410 and a level two cache 412, a processor core 414, and registers 416. The example processor core 414 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 418 may be used with the processor 404, or in some implementations the memory controller 418 may be an internal part of the processor 404.

Depending on the desired configuration, system memory 406 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 406 may include an operating system 420, one or more applications 422, and program data 424. The application 422 is actually a plurality of program instructions that direct the processor 404 to perform corresponding operations. In some implementations, the application 422 can be arranged to cause the processor 404 to operate with the program data 424 on an operating system.

Computing device 400 may also include an interface bus 440 that facilitates communication from various interface devices (e.g., output devices 442, peripheral interfaces 444, and communication devices 446) to the basic configuration 402 via bus/interface controller 430. The example output device 442 includes a graphics processing unit 448 and an audio processing unit 450. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 452. Example peripheral interfaces 444 may include a serial interface controller 454 and a parallel interface controller 456, which may be configured to facilitate communications with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 458. An example communication device 446 may include a network controller 460, which may be arranged to facilitate communications with one or more other computing devices 462 over a network communication link via one or more communication ports 464.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

In a computing device 400 according to the present invention, application 422 includes means 900 for building a question-answering system based on a product instruction manual, means 900 including a plurality of program instructions that may instruct processor 404 to perform method 500.

FIG. 5 illustrates a flow diagram of a method 500 for building a question-answering system based on a product instruction manual, according to one embodiment of the invention. As previously mentioned, a product instruction manual is an unstructured text, usually organized in a list, with each bottom list corresponding to a knowledge point, each knowledge point including a knowledge point title and knowledge point content. The product manual may be a product manual in various fields, for example: an instruction manual in the fields of automobiles, air conditioners, televisions and the like. Hereinafter, a manual of a product in the automobile field, that is, an automobile user manual, will be described as an example, but the present invention is not limited thereto, and may be a manual of a product in any field.

Referring to fig. 5, the method 500 begins at step S502. In step S502, a knowledge base is built according to the product instruction manual, the knowledge base includes a plurality of data entries, each data entry includes an association relationship between a knowledge point and a product, a component and a tag set, the tag set includes one or more semantic tags, and the semantic tags represent operation/description information related to the component.

Taking the automobile owner's manual as an example, as can be seen from the directory structure shown in fig. 1, the knowledge points of the automobile owner's manual are generally associated with certain parts of the automobile, such as the engine, the tires, the wiper, etc., and the knowledge points of "replacing the wiper blade" are associated with the "wiper" parts, so that each knowledge point can be associated with the corresponding automobile part.

In addition, the content of the knowledge point of the automobile user manual is generally operation/description information related to components, such as the model of the engine, the replacement method of the tire, and the like, so semantic labels such as "model", "replacement", and the like can be given to the related knowledge point, and as semantic labels thereof, the component and semantic label corresponding to the knowledge point such as "engine model" are "engine", "model".

Thus, by extracting parts and semantic tags from knowledge points, one or more unstructured product user manuals can be constructed as a structured knowledge base. The product manual is a digitized product manual, and a plurality of product manuals in the same field are processed to create one knowledge base in the field. For example, in the automobile field, different automobile user manuals exist for different automobile systems, and a knowledge base of one automobile field can be generated from a plurality of automobile user manuals for all automobile systems. Each data entry of the knowledge base corresponds to a knowledge point, the knowledge point is firstly associated with a product and a part of the product and then is also associated with a label set, the number of semantic labels in the label set can be 1, the knowledge point is represented to relate to 1 piece of semantic information of the part under the product, the number of semantic labels in the label set can be multiple, and the knowledge point is represented to relate to multiple pieces of semantic information of the part under the product

In one implementation, the knowledge base may be constructed as a part of a knowledge graph, where the product and knowledge points correspond to the relationships between nodes in the knowledge graph, and the component and label set correspond to the nodes, and the knowledge graph further includes the superior-inferior relationships between the product, the component, and the like, and by using these relationships, the inference capability can be enhanced, and the accuracy of finding the knowledge points can be significantly enhanced. For example, when the product is an automobile, the automobile series and the knowledge points are nodes in the map, the components and the semantic tags (one knowledge point may correspond to one or more semantic tags) correspond to the relationship between the nodes, and the knowledge map may further include the superior-inferior relationship between automobile noun terms, automobile series, components, and the like, as shown in fig. 3.

Fig. 6 shows a flow diagram for building a knowledge base from product instruction manuals in method 500. Referring to fig. 6, the method begins at step S602. In step S602, a component associated with a knowledge point is determined from the knowledge point title. A keyword dictionary related to the part may be preset, and based on the way of keyword matching, related parts of each knowledge point in each product usage manual may be extracted, if a keyword of a certain part in the keyword dictionary appears in a knowledge point title, the part is determined as a part related to the knowledge point, and the knowledge point is marked as a knowledge point subordinate to the part. Here, the keywords of the part may include: keywords associated with the name of the part and/or keywords that operate/describe the part. For example, "manually adjusted front seat," with the associated component being "front seat"; for another example, the "engine oil model of bmax 3" is associated with a component of "engine", that is, the keyword corresponding to the engine in the keyword dictionary further includes "engine oil".

Through the processing of step S602, association with a part is completed for each knowledge point in a plurality of product instruction manuals of a certain field (e.g., automobile field). Then, in step S604, a knowledge point set associated with the component is determined based on the association relationship between the knowledge point and the component. Specifically, each knowledge point is associated with one component, and then, by summarizing the components associated with all knowledge points, a plurality of knowledge points associated with each component can be obtained, and these knowledge points form a knowledge point set associated with the component.

For example, assume that: the knowledge point 1 association component 1, the knowledge point 2 association component 2, the knowledge point 3 association component 3, the knowledge point 4 association component 1, the knowledge point 5 association component 1 and the knowledge point 6 association component 3 summarize the data to obtain: component 1 associates { knowledge point 1, knowledge point 4, knowledge point 5}, component 2 associates { knowledge point 2}, and component 3 associates { knowledge point 3, knowledge point 6 }.

Then, in step S606, a labelset (component-labelset) associated with the component is determined from the knowledge point set associated with the component. Specifically, for each component, the following process may be performed to determine the set of tags associated with the component:

1) each knowledge point in the set of knowledge points associated with the part is traversed.

2) For each traversed knowledge point, extracting core words representing the operation/description of the part from the knowledge point titles of the knowledge points. The knowledge point heading may be parsed to extract its core words, e.g., a heading of "engine model" may extract the core word "model".

3) And summarizing the core words corresponding to all the traversed knowledge points to obtain a label set associated with the part. The summary herein may include de-duplication processing and synonym classification, for example, engine-related semantic tags including "model", "start", "keyless start", etc., and "add engine oil" with "prime", "supplement", etc. a series of synonyms.

For the automotive field, component-to-label systems are formed that cover major components of engines, tires, wipers, seats, safety belts, etc., with labels for each component describing information and operations related to that component. The following is a schematic of the component-label system for an "engine", where each row is a label associated with the engine, and separated by commas are synonyms for the labels:

oil, engine oil

Close and stop

Starting, or starting

Can not be started, started and operated

…

Then, in step S608, each knowledge point in the product instruction manual is structured and added to the knowledge base. Specifically, for each knowledge point, the following operations are performed:

1) acquiring products and components associated with the knowledge points, and acquiring a label set associated with the components;

2) matching the knowledge point title with the acquired label set, and taking one or more matched semantic labels as a label set associated with the knowledge point;

3) the knowledge point and the set of products, components and tags associated with the knowledge point are added to the knowledge base as a data entry.

For example, for an automobile user manual corresponding to bme X3, where there is a knowledge point of "wheel change", after structured processing, the corresponding data entry of the knowledge point in the knowledge base is:

knowledge points are as follows: "change wheel" (note: details under the relevant bibliography of the bmw X3 automotive user manual), product: bme X3, part: wheel, semantic label: and (4) replacing.

After the knowledge base corresponding to one or more product instruction manuals (i.e., a plurality of automotive user manuals in a field, such as the automotive field) is constructed, the method 500 proceeds to step S504. In step S504, a question template library is constructed from the historical question library, the question template library including a plurality of data entries, each data entry including an association relationship between a question template and a tag set, the tag set including one or more semantic tags representing operation/description information related to the component. Specifically, a historical problem library is mined, a template library of generalized problems to semantic labels is constructed by utilizing a semantic label extraction technology, the semantic labels of the problems can be accurately analyzed by utilizing matching with templates in the template library, and a basis is provided for semantic analysis of user problems.

FIG. 7 illustrates a flow diagram for building a problem template library from a historical problem library in method 500. Referring to fig. 7, the method starts in step S702, and in step S702, a question related to a product instruction manual is screened from a history question library to generate a candidate question library. The historical problem bank has a large number of user problems, some of which are related to the product instruction manual, and some of which are unrelated to the product instruction manual. Therefore, in this step, the questions in the history question bank can be matched with the component keywords (keywords in the keyword dictionary related to the component), and the questions related to the product instruction manual can be screened out.

In step S704, the questions in the candidate question bank are generalized as a question template. Specifically, for each question in the candidate question bank, the following processing is performed:

1) and performing product entity recognition, component recognition and core word recognition on the problem. The entity identification is also called named entity identification, and refers to identification of entities with specific meanings in texts, and mainly comprises names of people, places, organizations, products, proper nouns and the like. In the embodiment of the present invention, the product entity identification refers to a process of identifying a product entity from a question text, for example, a product entity identified from "which motor oil for golf" is "golf". Part recognition may match the question text using a part keyword dictionary. The core word recognition is to recognize a core word representing the operation/description of the component from the question text, and the core word can be extracted through syntactic analysis, for example, the core word "model" can be extracted from "engine model".

2) If the identified core word is a semantic label in a label set associated with the component or a synonym of the semantic label, adding the semantic label into the label set of the problem;

and if the identified core word is not the semantic label in the label set associated with the component or is not the synonym of the semantic label, matching the problem by using the semantic label in the label set associated with the component, and adding the matched semantic label into the label set of the problem.

3) And replacing the product entity in the problem with the type of the product entity to obtain a problem template of the problem. Here, the product entity refers to a specific product, and the type of the product entity refers to a category or a category to which the specific product belongs. For example, BMW X3 and Golf are both product entities, and the types of their corresponding product entities are both "car series".

Thus, for the question text "oil model of bma X3", the corresponding question template is an oil model of "{ vehicle train }, and the corresponding set of labels is {" oil "," model "}.

Through the processing of step S704, a plurality of question templates are obtained, and each question template is associated with one tag set. Since the number of questions included in the candidate question bank is generally large, there are many identical or similar question templates in the obtained question templates, and therefore, in step S706, all the question templates are aggregated to generate the question template bank. The method specifically comprises the following steps:

1) the same question template is subjected to deduplication processing, i.e., only one question template is reserved for a plurality of same question templates.

2) And merging the similar problem templates, namely only one problem template (which may be any one of the problem templates) is reserved in the plurality of similar problem templates, and the tag sets associated with the problem templates obtained by merging are: similar problem templates are respectively associated with a union of the sets of tags. Here, two question templates may be considered similar if their similarity is greater than a predetermined threshold. The similarity can adopt edit distance similarity, vector similarity or weighted average of the edit distance similarity and the vector similarity, the invention does not limit the specific similarity calculation method, and the field can reasonably select according to specific requirements.

For example: there are 3 similar problem templates: the problem templates obtained by merging the 3 similar problem templates may be: template 1, its associated set of tags is { tag 1, tag 2, tag 3 }.

3) And adding each problem template obtained by the deduplication processing and the combination processing and the associated label set thereof into the problem template library as a data entry.

The template library section is schematically as follows:

{ vehicle train } how to close automatic start-stop function- - [ close, automatic start-stop ]

{ vehicle series } engine oil type- - [ engine oil, type ]

{ vehicle system } replacement of coolant- - - [ replacement of coolant ]

{ vehicle system } can remote control start engine- - - [ start, remote control ]

…

After the problem template library is built, the method 500 proceeds to step S506. In step S506, the knowledge base and the question template base are constructed as a question-answering system. Specifically, the knowledge base and the question template base are stored in the computing device, and the question and answer processing unit is created in the computing device. That is to say, the question-answering system comprises a knowledge base, a question-answering template base and a question-answering processing unit, wherein the question-answering processing unit is suitable for matching a user question with the question template base when receiving the user question, acquiring a tag set associated with the user question, and then inquiring a corresponding knowledge point from the knowledge base according to a product, a component and the tag set associated with the user question, and returning the knowledge point to the user as an answer corresponding to the user question.

Fig. 8 shows a flow chart of the question answering system, i.e., the processing logic of the question answering processing unit, for performing question answering according to the embodiment of the present invention. Referring to fig. 8, the method starts in step S802, and in step S802, a user question is received, product entity identification and component identification are performed on the user question, and a product and a component associated with the user question are obtained. The entity identification is also called named entity identification, and refers to identification of entities with specific meanings in texts, and mainly comprises names of people, places, organizations, products, proper nouns and the like. In the embodiment of the present invention, the product entity identification refers to a process of identifying a product entity from a user question, for example, a product entity identified from "which motor oil for golf" is "golf". Part recognition may match user questions using a part keyword dictionary.

Then, in step S804, the user question is matched with the question template library, and a tag set associated with the user question is obtained. The method specifically comprises the following steps:

1) and replacing the product entity in the user problem with the type of the product entity to obtain the generalized user problem. Here, the product entity refers to a specific product, and the type of the product entity refers to a category or a category to which the specific product belongs. For example, BMW X3 and Golf are both product entities, and the types of their corresponding product entities are both "car series". For example, for the user question "oil type of bma X3", the generalized user question corresponding thereto is "oil type of { vehicle system"

2) And matching the generalized user problems with the problem template library to obtain a problem template corresponding to the user problems. If a question template is matched in the question template library, the matched question template is used as a question template corresponding to a user question; and if the problem template is not matched in the problem template library, calculating the similarity between the user problem and each problem template in the problem template library, and taking the problem template with the highest similarity as the problem template corresponding to the user problem.

The similarity of the user question to the question template may be: edit distance similarity, vector similarity, or a weighted average of the edit distance similarity and vector similarity.

Here, the calculation formula of the edit distance similarity edit _ semi (q, t) between the user question q and the question template t is:

where EditDistance (q, t) is the edit distance of the user question and question template, | q | and | t | are the text lengths of the user question and question template, respectively.

In addition, the user questions and question templates may be vectorized, with their vectors averaged from the word vectors of the words they contain. Thus, the vector similarity of the user question and the question template can be expressed as the cosine similarity cos _ sim (q, t) of the two vectors. Finally, the calculation formula of the similarity simi (q, t) between the user question and the question template can be as follows:

simi(q,t)＝a×edit_simi(q,t)+(1-a)×cos_simi(q,t)，

where a is a hyperparameter that regulates the weight ratio.

3) And acquiring a label set associated with the question template from the question template library as a label set associated with the user question.

Then, in step S806, according to the product and the component associated with the user question, a candidate knowledge point set is obtained from the knowledge base, that is, the knowledge points of which the product and the component are the same as the product and the component associated with the user question are queried from the knowledge base, and one or more queried knowledge points are the candidate knowledge point set. For example, the knowledge points of the manual of the vehicle family can be found by using the vehicle family entity identified in the user question, and then the knowledge points related to the component can be further screened by using the vehicle component identified in the user question.

Next, in step S808, a first matching score between the labelset associated with the user question and the labelset associated with the knowledge point in the candidate knowledge point set is calculated as a matching score between the user question and the knowledge point. Here, a matching score needs to be calculated for each knowledge point in the candidate knowledge point set.

The first match score may be:

a first ratio of an intersection of a tag set associated with the user problem and a tag set associated with the knowledge point to a union of the two;

vector similarity of a label set associated with the user question and a label set associated with the knowledge point; or

The first ratio and a weighted average of vector similarities.

In particular, labelsets to user questions_qLabelss of labelss with knowledge points_kAnd calculating a first matching score, wherein the first matching score is obtained by weighting Jaccard indexes of the two label sets and cosine similarity of the label vectors. The Jaccard index of the tag set isThe ratio of the intersection to the union of the two sets is calculated by the formula:

in the above formula, the numerator is the intersection of the two sets, and the denominator is the union of the two sets.

Meanwhile, the label set can be vectorized, and a semantic vector (labels) of the label set is obtained by averaging vectors of each label word:

among them, vector (labels)_i) Indicating the ith label in the label set labels_iThe vector of each tagged word can be obtained by the word vector technique, | labels | represents the number of tags in the set of tags labels. And the vector similarity of the label set is expressed as cosine similarity cos _ sim (labels) of the two vectors_q,labels_k). Thus, the similarity between the user question label set and the knowledge point label set can be calculated by the following formula:

score(labels_q,labels_k)

＝b×Jaccard(labels_q,labels_k)+(1-b)×cos_simi(labels_q,labels_k)

where b is a hyperparameter that regulates the weight ratio.

In another implementation manner, the vector similarity cos _ imi (q, title) between the user question q and the title of the knowledge point may be calculated as a second matching score, and the weighted average of the first matching score and the second matching score may be used as the matching score between the user question and the knowledge point. The question text and the knowledge point title can be vectorized, and vectors of the question text and the knowledge point title are obtained by averaging vectors of words contained in the question text and the knowledge point title. Thus, the final matching score is obtained by weighting the tag similarity and the question text-knowledge point title similarity, and the calculation formula is as follows: score ═ c × score (c × score: (c × score)labels_q,labels_k) + (1-c) x cos _ semi (q, title), where c is a hyperparameter that regulates weight ratio.

Finally, after the matching score between the user question and each knowledge point in the candidate knowledge point set is obtained, the process goes to step S810. In step S810, knowledge points with matching scores greater than a predetermined threshold in the candidate knowledge point set are obtained as answers corresponding to the user questions.

In one implementation, if there are a plurality of knowledge points whose matching scores are greater than a predetermined threshold, performing directory check on the knowledge points, aggregating the knowledge points belonging to the same directory into one aggregated knowledge point, and using a predetermined number of knowledge points whose matching scores are the highest as answers corresponding to the user questions. Here, the matching score of the aggregated knowledge point takes the highest matching score among the knowledge points before aggregation.

By integrating the steps, the scheme of the embodiment of the invention has the following advantages:

1) and the retrieval accuracy and the recall rate of the candidate knowledge points are high. By utilizing a knowledge base (such as a knowledge graph) construction technology, the unstructured product user manual knowledge can be structured into the knowledge graph of the product belonging field, most product user manual knowledge points are covered, and the domain ontology reasoning capability of the knowledge graph is fully utilized in the question and answer process, so that the retrieval accuracy and the recall rate of the knowledge points of the manual are improved.

2) And (4) accurate semantic parsing of the user question and answer matching. According to the invention, the information such as products, components, semantic labels and the like is extracted from the user questions, so that the query intention of the user can be accurately understood. And further based on the information, the accurate answer searching and grading sequencing in the knowledge base can be realized, so that the answer is accurate and the quality is controllable.

For example, assuming that the user question is "which oil for golf," the answers may first be limited to certain knowledge points in the automobile user manual for the golf automobile system that are relevant to the engine using the automobile system and parts information of "which oil for golf. The following parts are schematic, each row is a knowledge point, each knowledge point comprises four parts, the four parts are separated by "|", and the four parts are respectively a part, a label, a title and a content in sequence:

engine | [ engine oil, change ] | | changes engine oil. If you want to replace the engine oil regularly according to the period specified in the maintenance manual. Since the replacement of the oil and oil filter requires corresponding expertise and corresponding special tools, it is recommended that the oil and oil filter be replaced by a franchise of the company. The same is true for the disposal of used oil, which is also suggested to be disposed of by the franchise of the present company. For detailed information on the engine oil maintenance cycle, the "maintenance manual" can be referred to. The color of the engine oil can be darkened quickly by the additive in the engine oil, which is a normal phenomenon, and the engine oil does not need to be replaced frequently.

Engine | [ engine oil, specification ] | | engine oil specification. I must use the correct engine oil! Engine oil is an important factor affecting engine function and service life. When the vehicle leaves the factory, special high-quality compound viscosity engine oil is filled, and the engine oil can be used all the year round except for extreme cold climate. It is strongly recommended to use only the engine oil approved by this company as being suitable for the engines of cars purchased by you. Like other parts of a car, engine oil is continuously developed, and the franchisee of the company grasps the latest development dynamic and technical data of the vehicle oil and recommends that the franchisee of the company should change the engine oil. The quality of engine oil must not only meet the requirements of the engine and the exhaust gas purification system, but also match the quality of fuel. The engine oil always keeps in contact with combustion residues and fuel oil in the working process of the engine, so that the aging process of the engine oil is accelerated. The quality of the engine oil sold in the market is greatly different, so that the user must be careful when selecting the engine oil. The selected engine oil must meet the VW 50200 standard, and simultaneously, high-quality lead-free gasoline meeting the GB17930 standard must be used. Therefore, engine oils that meet the VW 50400 and VW 50700 standards are not suitable for use in china. Engine oil specifications allowed for use: a gasoline engine: VW 50200.

…

And obtaining the second knowledge point which is the best matched knowledge point through a scoring algorithm, and returning the second knowledge point to the user as an answer.

Fig. 9 shows a block diagram of an apparatus 900 for building a question-answering system based on a product instruction manual according to one embodiment of the present invention. The apparatus 900 resides in a computing device (e.g., the aforementioned computing device 400) to cause the computing device to perform the method 500 of the present invention for constructing a question-answering system. As shown in fig. 9, the apparatus 900 includes:

a knowledge base construction unit 910 adapted to construct a knowledge base according to a product usage manual, each data entry of the knowledge base including an association relationship of a knowledge point with a product, a component and a tag set, the tag set including one or more semantic tags representing operation/description information related to the component;

the question template library construction unit 920 is adapted to construct a question template library according to a historical question library, wherein each data entry of the question template library comprises an incidence relation between a question template and a label set;

the question-answering system constructing unit 930 is adapted to construct the knowledge base and the question template base as a question-answering system, so that when a user question is received, the user question is matched with the question template base to obtain a tag set associated with the user question, and then a corresponding knowledge point is queried from the knowledge base according to a product, a component and the tag set associated with the user question as an answer corresponding to the user question.

The functions and specific execution logic of the knowledge base constructing unit 910, the question template base constructing unit 920 and the question-answering system constructing unit 930 may be described with reference to the method 500, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Claims

1. A method of constructing a question-answering system based on product instruction manuals, executed in a computing device, the product instruction manuals organized in catalogues, each lowest catalog corresponding to a knowledge point, each knowledge point including a knowledge point title and knowledge point content, the method comprising:

constructing a problem template library according to a historical problem library, wherein each data entry of the problem template library comprises an incidence relation between a problem template and a label set, and the problem template is obtained by replacing a product entity in a problem with the type of the product entity; and

constructing the knowledge base and the question template base into a question-answering system so as to match the user question with the question template base when receiving the user question, acquiring a label set associated with the user question, and further inquiring a corresponding knowledge point from the knowledge base according to a product, a component and the label set associated with the user question as an answer corresponding to the user question;

wherein, the building of the knowledge base according to the product instruction manual comprises the following steps:

determining a component associated with the knowledge point according to the knowledge point title;

determining a knowledge point set associated with the component according to the association relationship between the knowledge point and the component;

determining a tag set associated with the component according to the knowledge point set associated with the component;

for each of the knowledge points,

acquiring a label set associated with the part associated with the knowledge point;

matching the knowledge point title with the acquired label set, and taking one or more matched semantic labels as a label set associated with the knowledge point;

the knowledge point and the set of products, components and tags associated with the knowledge point are added to the knowledge base as a data entry.

2. The method of claim 1, wherein the determining the components associated with the knowledge points based on knowledge point titles comprises:

if the keyword of a certain part appears in the knowledge point title, the part is determined as the part associated with the knowledge point.

3. The method of claim 2, wherein the keywords of the part comprise: keywords associated with the name of the part and/or keywords that operate/describe the part.

4. The method of claim 1, wherein determining a set of tags associated with a component based on a set of knowledge points associated with the component comprises:

for each of the components it is necessary to provide,

traversing each knowledge point in the set of knowledge points associated with the part;

for each traversed knowledge point, extracting a core word representing the operation/description of the part from the knowledge point title of the knowledge point;

and summarizing the core words corresponding to all the traversed knowledge points to obtain a label set associated with the part.

5. The method of claim 4, wherein each semantic tag in the set of tags comprises one or more synonyms.

6. The method of claim 1, wherein the knowledge base is a knowledge graph, wherein products and knowledge points correspond to relationships between nodes in the knowledge graph, parts and labelset corresponding nodes.

7. The method of claim 1, wherein the building a library of question templates from a library of historical questions comprises:

screening out problems related to a product operation manual from a historical problem library to generate a candidate problem library;

for each question in the library of candidate questions,

performing product entity recognition, component recognition and core word recognition on the problem;

if the identified core word is a semantic label in a label set associated with the component or a synonym of the semantic label, adding the semantic label into the label set of the problem;

replacing the product entity in the problem with the type of the product entity to obtain a problem template of the problem;

and aggregating all the problem templates to generate a problem template library.

8. The method of claim 7, wherein said screening a historical library of questions for questions related to a product instruction manual comprises:

and matching the questions in the historical question bank by using the part keywords so as to screen out the questions related to the product instruction manual.

9. The method of claim 7, wherein if the identified core word is not a semantic tag in the set of tags associated with the component or is not a synonym of the semantic tag, matching the problem with the semantic tag in the set of tags associated with the component, adding the matched semantic tag to the set of tags for the problem.

10. The method of claim 7, wherein the aggregating all the question templates to generate a question template library comprises:

carrying out duplicate removal treatment on the same problem template;

and merging the similar problem templates, wherein the label set associated with the problem templates obtained by merging is as follows: the similar problem templates are respectively associated with a union set of label sets;

and adding each problem template obtained by the deduplication processing and the combination processing and the associated label set thereof into the problem template library as a data entry.

11. The method of any one of claims 1-10, wherein matching the user question with the question template library to obtain a set of tags associated with the user question comprises:

product entity identification and component identification are carried out on the user problems, and products and components related to the user problems are obtained;

replacing the product entity in the user problem with the type of the product entity to obtain a generalized user problem;

matching the generalized user problems with the problem template library to obtain problem templates corresponding to the user problems;

and acquiring a label set associated with the question template from the question template library as a label set associated with the user question.

12. The method of claim 11, wherein said matching the generalized user question with the question template library to obtain a question template corresponding to the user question comprises:

if the question template is matched in the question template library, taking the matched question template as a question template corresponding to the user question;

and if the problem template is not matched in the problem template library, calculating the similarity between the user problem and each problem template in the problem template library, and taking the problem template with the highest similarity as the problem template corresponding to the user problem.

13. The method of claim 12, wherein the similarity is:

the editing distance similarity between the user question and the question template;

vector similarity between the user question and the question template; or

The edit distance similarity and the vector similarity are weighted averages.

14. The method of claim 11, wherein said querying the corresponding knowledge points from the knowledge base comprises:

acquiring a candidate knowledge point set from the knowledge base according to products and components related to user problems;

calculating a tag set associated with the user question and a tag set associated with the knowledge point in the candidate knowledge point set, wherein first matching scores of the tag set associated with the user question and the tag set associated with the knowledge point in the candidate knowledge point set are used as matching scores of the user question and the knowledge point;

and acquiring the knowledge points with the matching scores larger than a preset threshold value as answers corresponding to the user questions.

15. The method of claim 14, wherein the first match score is:

The first ratio and a weighted average of vector similarities.

16. The method of claim 14, wherein said querying the corresponding knowledge point from the knowledge base further comprises:

calculating the vector similarity between the user question and the title of the knowledge point as a second matching score;

and taking the weighted average of the first matching score and the second matching score as the matching score of the user question and the knowledge point.

17. The method of claim 14, wherein the obtaining of the knowledge point with the matching score larger than the predetermined threshold as the answer corresponding to the user question comprises:

and if a plurality of knowledge points with matching scores larger than a preset threshold exist, performing directory inspection on the knowledge points, aggregating the knowledge points belonging to the same directory into an aggregated knowledge point, and taking the preset number of knowledge points with the highest matching scores as answers corresponding to the user questions.

18. The method of claim 17, wherein the match score for the aggregated knowledge point is the highest match score among the knowledge points before aggregation.

19. The method of claim 1, wherein the product instruction manual is an automotive user manual.

20. An apparatus, residing in a computing device, for constructing a question-answering system based on product instruction manuals organized in catalogues, each of the lowest catalogues corresponding to a knowledge point, each knowledge point including a knowledge point title and knowledge point content, the apparatus comprising:

the problem template library construction unit is suitable for constructing a problem template library according to a historical problem library, each data entry of the problem template library comprises an incidence relation between a problem template and a label set, and the problem template is obtained by replacing a product entity in a problem with the type of the product entity;

the question-answering system construction unit is suitable for constructing the knowledge base and the question template base into a question-answering system so as to match the user question with the question template base when receiving the user question, obtain a label set associated with the user question, and further query a corresponding knowledge point from the knowledge base according to a product, a component and the label set associated with the user question to serve as an answer corresponding to the user question;

wherein the knowledge base construction unit is further adapted to:

for each of the knowledge points,

21. A computing device, comprising:

at least one processor; and

a memory storing program instructions configured for execution by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-19.

22. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-19.