WO2020077896A1 - 提问数据生成方法、装置、计算机设备和存储介质 - Google Patents

提问数据生成方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020077896A1
WO2020077896A1 PCT/CN2019/070844 CN2019070844W WO2020077896A1 WO 2020077896 A1 WO2020077896 A1 WO 2020077896A1 CN 2019070844 W CN2019070844 W CN 2019070844W WO 2020077896 A1 WO2020077896 A1 WO 2020077896A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
entity
user
target
entities
Prior art date
Application number
PCT/CN2019/070844
Other languages
English (en)
French (fr)
Inventor
臧磊
傅婧
郭鹏程
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Priority to SG11201913916QA priority Critical patent/SG11201913916QA/en
Publication of WO2020077896A1 publication Critical patent/WO2020077896A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Definitions

  • This application relates to a method, device, computer equipment, and storage medium for generating question data.
  • the inventor realized that at present, the question data in the credit interview process is fixed and not dynamic, which may lead to the leakage of the credit interview question bank, and thus there is a problem of low security of the credit interview question bank.
  • a method, device, computer device, and storage medium for generating question data are provided.
  • a method for generating question data includes:
  • a question data generating device includes:
  • the receiving module is used to receive user answer data sent by the terminal corresponding to the current question data
  • An extraction module for extracting keywords from the user answer data based on a preset keyword extraction method
  • the query module is used to query matching entities and attributes in the constructed knowledge graph according to the keywords
  • the generation module is used to determine the target question data based on the queried entities and attributes.
  • the sending module is configured to send the target question data to the terminal for display.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the computer readable instructions are executed by the one or more processors, the one or more Each processor implements the steps of the question data generation method provided in any embodiment of the present application.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to implement any of the present application. The steps of the method for generating question data provided in an embodiment.
  • FIG. 1 is an application scenario diagram of a method for generating question data according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of a method for generating question data according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of a method for generating question data in another embodiment.
  • FIG. 4 is a block diagram of an apparatus for generating question data according to one or more embodiments.
  • FIG. 5 is a block diagram of a question data generating device in another embodiment.
  • Figure 6 is a block diagram of a computer device in accordance with one or more embodiments.
  • the question data generation method provided in this application can be applied to the application environment shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the server 104 receives the user answer data corresponding to the current question data received by the terminal 102, extracts keywords from the user answer data, and queries the entities and attributes matching the keyword in the knowledge graph constructed based on the extracted keywords , Generate target question data according to the queried entities and attributes, and send the target question data to the terminal 102 for display.
  • the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for generating question data is provided. Taking the method applied to the server in FIG. 1 as an example for illustration, it includes the following steps:
  • the current question data is the question data currently asking the user, that is, the question data corresponding to the current question and answer data processing process.
  • the current question data may specifically refer to the question data that is closest to the current time, that is, the latest question data.
  • the question data may specifically be text data displayed through the display screen, or voice information displayed through voice broadcast.
  • the current question data corresponds to the user answer data and the user answer image currently fed back by the terminal.
  • the user answer data is the user's answer data corresponding to the current question data.
  • the user answering data may specifically be text answering data manually entered by the user on the terminal, or may be answering information in voice form collected by the terminal when the user answers the current questioning data.
  • the terminal when receiving the current question data sent by the server, displays the received current question data to the user through voice broadcast or display screen display, and detects in real time that the user responds to the user's answer data corresponding to the current question data.
  • the terminal sends the detected user answer data to the server.
  • the server receives the question and answer instruction, obtains the corresponding current question data according to the received question and answer instruction, and sends the acquired current question data to the terminal.
  • the server selects the current question data from the pre-stored preset question bank according to the question and answer instruction.
  • the server may also obtain the corresponding question material from the pre-stored question material set according to the question and answer instruction, and generate corresponding current question data according to the acquired question material.
  • the server may also select entities and corresponding attributes from the constructed knowledge graph according to the question and answer instructions, and generate current data corresponding to the selected entities and attributes.
  • the preset question bank is a question data set composed of multiple question data generated in advance.
  • the question material set is a material collection composed of multiple question materials. Question materials such as ID card number, nationality, current residence, etc.
  • the terminal when the terminal presents the current question data to the user, the user's answer operation on the current question data is detected in real time, and the corresponding user answer data is obtained according to the detected answer operation. Answering operations such as user input operations on the terminal, or user dictation operations.
  • the server when the received user answer data is in the form of voice, the server performs speech recognition on the user answer data to obtain the answer voice text corresponding to the user answer data, based on the extracted voice
  • the answer text performs the relevant steps of the following question data generation method. Speech recognition can be based on existing speech recognition technology, and will not be described here.
  • the preset keyword extraction method is a preset method for extracting keywords in user answer data.
  • the preset keyword extraction method may be a keyword matching method based on a preset keyword library; or it may be to segment the user's answer data and select keywords from multiple words obtained by segmentation according to preset filtering conditions; and Keyword extraction can be based on a trained keyword extraction model.
  • the keyword extraction model can be specifically divided into models obtained by training based on supervised, semi-supervised, and unsupervised training methods.
  • the server performs keyword extraction on the received user answer data based on a preset keyword extraction method to obtain keywords corresponding to the user answer data.
  • the server performs word segmentation on the user's answer data, matches each word obtained by the word segmentation with the preset keyword library, and uses words that successfully match the preset keyword library as keywords.
  • the preset keyword library is a keyword word set composed of multiple preset keywords.
  • the server performs word segmentation on the user answer data to obtain a word segmentation result, then calculates feature weights for each word in the word segmentation result, and sorts each word in the word segmentation result according to the feature weight, and then selects according to the sorting result Key words. Specifically, the server calculates the TF-IDF weight for each word in the word segmentation result, sorts each word in descending order according to the TF-IDF weight, and selects a preset number of words that are ranked high as keywords.
  • the calculation process of the TF-IDF weight of each word includes: first, calculating the word frequency of each word in the word segmentation result, which can be calculated by referring to the following formula:
  • Word Frequency TF The number of times a word appears in the document / total number of words in the document;
  • IDF log (total number of documents in the corpus / (number of documents containing the word + 1));
  • the server inputs user answer data into a pre-trained keyword extraction model for prediction, and obtains keywords corresponding to the user answer data.
  • the keyword extraction model is a model obtained by performing model training according to the training sample set obtained in advance.
  • the knowledge graph is a semantic network graph containing various entities or concepts and their relationships.
  • a node in the knowledge graph is an entity node corresponding to an entity.
  • An edge connected to any entity node represents an attribute of the entity corresponding to the entity node.
  • An entity connected to the other end of the edge connected to any entity node The entity corresponding to the node is the attribute value corresponding to the node of any entity.
  • the entity corresponding to any entity node is the headquarters of XX Group
  • the edge connected to the entity node represents the address, then the entity corresponding to the entity node connected to the other end of the edge connected to the headquarters of XX Group is Shenzhen Futian District.
  • the entities in the knowledge graph may include concepts, names of people, places, names of enterprises, etc.
  • the attribute in the knowledge graph may be the relationship (entity relationship) between two entities connected by the edge corresponding to the attribute, or the characteristics possessed by the corresponding entity itself.
  • the knowledge graph may specifically be a knowledge base containing a large number of triples.
  • the triplet includes an entity, an entity relationship, and an associated entity associated with the entity through the entity relationship.
  • the entity relationship in each triple is used as the attribute between the entity and the corresponding associated entity, and a corresponding directional connection is established between the entity and the corresponding associated entity according to the attribute, thereby constructing a corresponding knowledge graph.
  • the server matches the extracted keywords with each entity and attribute in the constructed knowledge graph, and determines the entities and attributes that match the keyword successfully as the entities and attributes that match the keyword.
  • the server matches the extracted keywords with each entity in the knowledge graph, and determines the matched entity as the entity that is queried according to the keyword.
  • the server matches each extracted keyword with each attribute in the knowledge graph, and determines the attribute that matches successfully as the attribute that is queried according to the keyword.
  • the server may match the extracted synonyms of the keyword with each attribute in the knowledge graph, and determine the attribute that matches successfully as the attribute that matches the keyword.
  • the server may also match the searched keywords with the keywords of each attribute in the knowledge graph, and determine the attribute that matches successfully as the attribute that matches the keyword.
  • the server matches the extracted keyword with each entity in the knowledge graph to correspond the keyword to an entity in the knowledge graph, so as to find the keyword in the knowledge graph Matching entities.
  • the server uses NLP (Natural Language Processing) technology to understand the semantics expressed by the user's answer data, and matches the understood semantics with the attributes in the knowledge graph to identify the attributes corresponding to the user's answer data. The attribute is determined to match the corresponding keyword.
  • the server may also match the semantics expressed in the current question data with the attributes in the knowledge graph to determine the attributes that match the extracted keywords.
  • the keywords extracted from the user answer data can be understood as the named entities identified from the user answer data, and the semantics expressed by the user answer data can be understood as the attributes corresponding to the named entity .
  • Named entities include person names, place names, and corporate organization names.
  • the server may determine the named entity according to the corresponding multiple keywords.
  • the server queries the matching entity in the knowledge graph according to the identified named entity.
  • the server calculates the matching rate between the named entity and each entity in the knowledge graph, and determines the entity that matches the named entity when the matching rate reaches a preset threshold.
  • Matching rate refers to the degree of matching between named entities and corresponding entities.
  • the entity matching the named entity may be the name of the enterprise institution, or may be the name of another enterprise institution corresponding to the enterprise institution corresponding to the name of the enterprise institution, such as abbreviation or full name.
  • the named entity is a place name
  • the entity matching the named entity may be the place name, or may be other place names corresponding to other addresses within a preset range from the address corresponding to the place name.
  • the named entity identified from the user answer data in the above manner is "No. 58 Keyuan Road, Nanshan District", and the attribute corresponding to the named entity can be determined to be "address” based on the above keyword extraction method or semantic understanding method.
  • the entity that the server queries from the knowledge graph and matches the named entity may be "No. 58 Keyuan Road, Nanshan District” or "No. 59 Keyuan Road, Nanshan District”.
  • the server may determine the content text of the user answer data as the keyword corresponding to the user answer data. Further, the server may determine the semantics represented by the user's answer data according to the current question data and the correspondingly extracted keywords from the user's answer data, and query the corresponding knowledge from the constructed knowledge graph according to the semantics Matching entities and attributes. For example, suppose the current question data is "There is an XX bank at 59 Keyuan Road, Nanshan District, is it right?", The corresponding user answer data is "yes”, and the server determines the entity according to the semantics represented by the user answer data The attributes can be "No. 57 Keyuan Road, Nanshan District” and "Address", respectively.
  • S208 Determine target question data according to the queried entity and attribute.
  • the target question data refers to the question data when a question is initiated to the user again after the current question data.
  • the target question data may specifically be text data that can be displayed through the display screen, or voice information that can be displayed through voice broadcast.
  • the server queries the related knowledge graph for related entities associated with the entity through the attribute.
  • the server generates corresponding target question data according to the queried entity, attribute and corresponding associated entity.
  • the identified entity in the knowledge graph based on the identified named entity is "59 Keyuan Road, Nanshan District", and the attribute "address" corresponding to the named entity is in the knowledge graph.
  • Corresponding query related entity is "XX Bank”
  • the corresponding target question data generated can be "There is a XX Bank at No. 59 Keyuan Road, Nanshan District, is it?" XX banks, right? "
  • the server queries the entity that matches the keyword in the constructed knowledge graph based on the keywords extracted from the user's answer data, and determines the connection to the entity according to the knowledge graph One or more sides.
  • the one or more edges represent the attribute of the entity, and the server may select an attribute from the one or more attributes, and generate target question data according to the selected attribute and the corresponding entity.
  • the server correspondingly queries one or more associated entities associated with the entity through the one or more edges in the knowledge graph.
  • the one or more associated entities are attribute values corresponding to the attributes.
  • the server selects one attribute from the one or more attributes, and selects the associated entity corresponding to the selected attribute from the corresponding one or more associated entities, and corresponds to the selected attribute, the associated entity and the corresponding entity Generate target question data.
  • the corresponding query entity is "XX Group”
  • the attributes corresponding to the query in the knowledge graph based on the entity can be "address”, "legal person”, “establishment time” Wait.
  • the target question data generated correspondingly in the above manner may be "Who is the legal person of XX Group?", Or "The legal person of XX Group is Zhang San, is it?" Etc.
  • S210 Send target question data to the terminal for display.
  • the server sends target question data determined corresponding to the queried entity and attribute to the terminal through the network.
  • the terminal displays the received target question data to the corresponding user through display screen display or voice broadcast.
  • the terminal when the terminal presents the received target question data to the user through voice broadcast or display screen display, the user is detected in real time to respond to the target question data corresponding to the user's answer data, and the detected user answers the question
  • the data is sent to the server, so that the server continues to perform the relevant steps of the above question data generation method according to the received user answer data.
  • the server sends the target question data to the user terminal to display the target question data to the corresponding user through the user terminal.
  • the server can also send the target question data to the salesperson terminal for display.
  • the receiving terminal responds to the user answer data corresponding to the current question data, extracts corresponding keywords from the user answer data based on a preset keyword extraction method, and based on the extracted keywords in the constructed knowledge
  • the corresponding queries in the map match the entities and attributes of the keywords, and then generate corresponding target question data according to the queryed entities and attributes, which improves the accuracy of the target question data.
  • the target question data for the next question is dynamically determined to ensure the uncertainty of the target question data, which can effectively avoid the leakage of the question bank composed of each question data, thereby improving the question bank safety.
  • step S208 includes: querying the knowledge graph for related entities that are associated with the entity based on the attributes according to the queryed entities and attributes; there are multiple related entities; and selecting target related entities from each related entity; Generate target question data based on entities, attributes and target related entities.
  • An associated entity is an entity that is associated with another entity through edges (attributes) in the knowledge graph. It can be understood that both the entity and the associated entity are entity nodes in the knowledge graph, and the entity and the associated entity are related by attributes, that is, the attributes are used to represent the entity relationship between the entity and the corresponding associated entity. Two entities related by attributes can be related entities.
  • the associated entity may be an attribute value corresponding to the corresponding entity, or may be another entity associated with the corresponding entity through the entity relationship characterized by the attribute.
  • the server queries a plurality of related entities associated with the entity through the attribute in the knowledge graph.
  • the server selects a target related entity from the plurality of queried related entities, and generates target question data according to the target related entity, and corresponding entities and attributes.
  • the server may randomly select an associated entity as the target associated entity from the plurality of queried associated entities.
  • the server can also obtain the user ID corresponding to the received user answer data, count the total number of related entities queried, and calculate the hash value for the user ID according to the total number of statistics, and then calculate the hash according to the calculated hash The value selects the target related entity from the plurality of related entities.
  • the server may also sequentially select an associated entity as the target associated entity from the plurality of queried associated entities by polling.
  • the target related entity is selected from the multiple related entities that are queried, and the target question data is generated according to the target related entity and the corresponding entities and attributes, which further improves the uncertainty of the target question data, thereby further Improve the safety of the question bank.
  • step S204 includes: preprocessing the user answer data; extracting keywords from the preprocessed user answer data according to a matching method based on a preset keyword library.
  • Preprocessing includes word segmentation processing and stop word processing.
  • Word segmentation processing is a process of dividing user answer data in text form into words one by one.
  • word segmentation algorithms corresponding to word segmentation processing, for example, a word segmentation algorithm based on string matching, a word segmentation algorithm based on semantic analysis, or a word segmentation algorithm based on statistics.
  • Word segmentation algorithms based on string matching such as forward maximum matching algorithm, reverse maximum matching algorithm, least cut algorithm or bidirectional maximum matching algorithm.
  • the trained word segmentation model can also be used to perform word segmentation processing on user answer data.
  • the word segmentation model may specifically be a hidden Markov model or a CRF (conditional random field algorithm) model.
  • Stopwords refer to certain words or words that are automatically filtered out before or after processing natural language data (or text) in order to save storage space and improve search efficiency in information retrieval, such as mood auxiliary words, polite words, prepositions Or connecting words, such as, ah, ah, ah, etc.
  • the preset keyword library is a keyword word set composed of multiple preset keywords set in advance.
  • the preset keyword library may also include synonyms of each preset keyword.
  • the server performs word segmentation processing on the received user answer data to obtain a word segmentation result, and performs word removal and stop word processing on the word segmentation result to obtain a corresponding candidate keyword set.
  • the server can match each word in the word segmentation result with the pre-built stop word library, determine the words that successfully match the stop word library as stop words, and remove the stop word from the word segmentation results.
  • the server uses each word in the word segmentation result after removing stop words as a candidate keyword to obtain a corresponding candidate keyword set.
  • the server matches each candidate keyword in the candidate keyword set with the preset keyword in the preset keyword library and the synonyms of the preset keyword, respectively.
  • the candidate keyword indicating that the match is successful is the preset keyword in the preset keyword library or a synonym for the preset keyword, and the server determines the candidate keyword as the pre-processed user answer data Extracted keywords.
  • the server uses NLP (Natural Language Processing) technology to process the user answer data, and semantically analyzes and understands the processed user answer data to obtain corresponding keywords.
  • NLP Natural Language Processing
  • the server pre-processes the user answer data, and extracts keywords from the pre-processed user answer data according to a dictionary-based and regular matching method. Specifically, the server matches each candidate keyword obtained by the preprocessing with the preset keyword in the preset dictionary, and determines the candidate keyword that matches successfully as the extracted keyword.
  • the keywords are extracted based on the preset keyword library, so that the keywords extracted from the user answer data are more accurate.
  • the above question data generation method further includes: acquiring target data; identifying each target entity in the target data and the entity relationship between the target entities; based on each target entity and the corresponding entity relationship, Construct the knowledge graph according to the preset construction method.
  • the target data is the original data used to construct the knowledge graph.
  • the target data may specifically be original data of one or more designated fields, such as the financial field and the manufacturing field.
  • the financial field may include industries such as banking, securities, insurance, trust, credit, and funds.
  • Each industry may include multiple enterprise institutions, and each enterprise institution corresponds to the corresponding original enterprise data.
  • the original data can include internal data and external data.
  • Internal data is data stored in the enterprise's local database, such as operational data and business data.
  • External data is data that can be crawled from other storage spaces through third-party platforms, such as business data. , CBRC data, People's Bank data and company annual reports.
  • the original corporate data of group companies can also include the original data of each subsidiary, such as investment relationship, time of establishment and address. For the original data of the enterprise that contains the bank, it can also include the address of each branch.
  • the server when the server receives the construction instruction of the knowledge graph, it obtains the published original data from various channels according to the received construction instruction, and obtains the pre-stored and unpublished original data from the local database to obtain The corresponding target data.
  • the server performs data processing on the acquired target data, and then identifies each target entity in the target data and the entity relationship between each target entity.
  • the server regards the entity relationship between any two target entities as the attribute between any two target entities, and establishes a corresponding directional connection between the any two entities according to the attribute. For each target entity identified from the target data and the corresponding entity relationship, the server performs the above steps, thereby constructing a knowledge graph corresponding to the target data.
  • the server may use data warehouse technology to perform data processing on the acquired target data.
  • the server saves the relational data in the knowledge graph constructed in the above manner to the graph database.
  • the graph database may be Neo4j (a high-performance NOSQL graph database) database.
  • the target data includes original data of a plurality of specified fields.
  • the server can use the domain as the vertex (core node) of the knowledge graph, and determine the associated node connected to the core node according to the entity relationship identified from the target data.
  • the server determines the related nodes connected to each related node according to the entity relationship, and takes the entity relationship between each node as an attribute, and constructs a knowledge graph based on the determined each node and the corresponding attribute.
  • the core nodes and associated nodes are all nodes (entity nodes) in the knowledge graph.
  • the server may separately construct the knowledge graph corresponding to each specified domain according to the above-mentioned manner, and according to the entity or attribute of the intersection between any two specified domains, the knowledge corresponding to each of the two specified domains Atlas fusion, to obtain cross-domain knowledge atlas.
  • An entity or attribute that has an intersection between any two specified domains refers to an entity or attribute contained in the knowledge graph corresponding to each of the two specified domains.
  • the corresponding entities and entity relationships are identified according to the target data, and then the corresponding knowledge graph is constructed according to the identified entities and entity relationships, so that the constructed knowledge graph can be directly used when dynamically generating question data, improving The efficiency of generating question data.
  • the user answering data includes answering voice information
  • step S204 includes: performing speech recognition on the answering voice information to obtain corresponding answering voice text; and extracting keywords from the answering voice text based on a preset keyword extraction method.
  • Answering voice information is the correspondingly collected voice information when the user performs the answering operation on the current question data.
  • Answer voice text is the voice text content carried in the answer voice information.
  • the server obtains corresponding answering voice information from the received user answering data, and performs speech recognition on the obtained answering voice information to extract corresponding answering voice text from the answering voice information.
  • the server performs word segmentation on the extracted answer text and matches each word obtained with the word segment with the preset keyword library.
  • the words that successfully match the preset keyword library are extracted from the answer speech text Key words.
  • the server extracts the corresponding answer speech text from the answer speech information through the pre-trained speech recognition model, and extracts the corresponding keyword from the answer speech text through the pre-trained keyword extraction model. It can be understood that the server can also extract corresponding answer text from the answer voice information based on the voice recognition technology in other existing technologies, and similarly, can also extract keywords correspondingly based on keyword extraction technologies in other existing technologies And will not be repeated here.
  • the server performs voiceprint recognition on the answering voice information to extract the corresponding target voiceprint feature, and authenticates the user who performs the answering operation on the current question data according to the target voiceprint feature.
  • the server continues to perform the step of performing speech recognition on the answering voice information to obtain the corresponding answering voice text.
  • the corresponding answer voice text is recognized based on the voice recognition technology, and then keywords are correspondingly extracted from the answer voice text to improve the processing efficiency of the user answer data, thereby Improve the efficiency of generating question data.
  • the user answer data includes the user answer text and the user answer image; before step S206, the above question data generation method further includes: determining the answer score corresponding to the user answer text and determining the corresponding answer image of the user answer image Expression score; determine the corresponding comprehensive score according to the answer score and expression score; when the integrated score is lower than the preset score threshold, execute the query of matching entities and attributes in the constructed knowledge graph based on keywords A step of.
  • User answer text is answer data in text form.
  • the terminal can obtain the corresponding user answer text according to the user's manual entry operation, and can also use the voice content recognized from the voice information correspondingly collected when the user performs the answer operation as the user answer text.
  • the user answering video is the image information correspondingly collected when the user responds to the current question data corresponding to the answer data.
  • the image information may specifically be information in the form of images or videos collected by the image collector.
  • User answering images may include but are not limited to user answering images and user answering videos.
  • the user answering image is, for example, an image corresponding to the user's face image collected when the user answers the question.
  • the user answering video is, for example, the video collected during the user answering process, that is, the video collected during the user answering operation.
  • the image collector may be a camera, and the camera may be a camera configured on the terminal, or may be an independent component connected point-to-point with the terminal.
  • the user answering image corresponds to the user answering text.
  • the server obtains the corresponding user answer text and the user answer image from the received user answer data.
  • the server determines the answer score corresponding to the user's answer text according to the preset answer score determination method, and determines the expression score corresponding to the user's answer image according to the preset expression score determination method.
  • the server determines the comprehensive score corresponding to the corresponding user's answer data according to the determined answer score and expression score, and compares the determined comprehensive score with a preset score threshold. When the comprehensive score is lower than the preset score threshold, the server executes the step of querying the matching entities and attributes in the constructed knowledge graph based on the keywords advanced from the user answer text contained in the user answer data .
  • the preset answer score can be determined according to the matching rate between the user ’s answer text and the corresponding preset answer, and the preset answer score corresponding to the current question data.
  • the answer score can also be determined by the user ’s answer text. Match the keyword with the preset keyword to determine the corresponding answer score; you can also enter the user answer text into the trained answer score prediction model for prediction to obtain the corresponding answer score.
  • the preset expression score determination method may be to extract corresponding user micro-expressions from the user answering images, and determine the expression scores corresponding to the user answering images based on the extracted user micro-expressions based on the micro-expression recognition technology.
  • the server obtains multiple frames of the user's face images based on the user's answering images, and determines the respective expression scores of the multiple frames of the user's face images based on the micro expression recognition technology, and determines the multiple expression scores The expression score corresponding to the video of the user answering the question.
  • the server when the expression score corresponding to the user answering image is lower than the preset expression score threshold, the server performs a step of querying matching entities and attributes in the constructed knowledge graph according to the corresponding keywords.
  • the integrated score when the integrated score is lower than the preset score threshold, it indicates that the user is in doubt about the correctness of the answer text, and then selects entities and attributes related to the current question data based on the constructed knowledge graph, and dynamically generates corresponding Target question data to make the dynamically generated target question data more accurate.
  • the user answer data corresponds to the user ID
  • the above question data generation method further includes: counting the total amount of question data corresponding to the user ID; when the total number of statistics reaches the preset total amount, stopping the question data Generation process; determine the total score corresponding to the user ID according to the comprehensive score corresponding to the question data corresponding to the user ID; push the total score to the terminal for display.
  • the server receives the user answer data corresponding to the user ID and the current question data, and counts the total number of question data corresponding to the user ID, and compares the total number of statistics with the preset total number. When the total number of statistics reaches the preset total number, the server stops the current question data generation process. The server determines the comprehensive score corresponding to the current question data in the above manner, and obtains the comprehensive score corresponding to the existing question data corresponding to the corresponding user ID. The currently determined comprehensive score and the correspondingly obtained The comprehensive scores are directly summed or weighted to determine the total score corresponding to the corresponding user ID, and the total score is sent to the terminal for display.
  • the server generates corresponding prompt information according to the calculated total score and the preset total score threshold, and pushes the prompt information to the terminal. For example, when the above method is applied to the question and answer session in the credit interview process, the server may correspondingly generate prompt information indicating the success or failure of the interview. Interview refers to the verification of the user's identity during the credit business process. When the above method is applied to the interview session, the server may correspondingly generate prompt information indicating the success or failure of the interview.
  • the server determines the comprehensive score corresponding to the current question data in the above manner, and updates the existing total score corresponding to the corresponding user ID according to the comprehensive score.
  • the server stops the current question data generation process.
  • a method for generating question data is provided.
  • the method specifically includes the following steps:
  • S306 Extract keywords from pre-processed user answer data according to a matching method based on a preset keyword library.
  • S312 Construct a knowledge graph according to a preset construction method according to each target entity and corresponding entity relationship.
  • S320 Generate target question data according to the entity, attribute and target related entity.
  • S322 Send the target question data to the terminal for display.
  • keywords are extracted from the user answer data corresponding to the current question data, and the corresponding matching entities and attributes are queried according to the keywords based on the constructed knowledge graph, and the related entities are correspondingly determined, and then the entities, attributes and corresponding The related entities of the company dynamically generate target question data and send it to the terminal for display, which improves the uncertainty of the target question data and thus improves the security of the question bank.
  • the above question data generation method can be applied to the question and answer session of the credit interview, the interview session, and also the question and answer session of the public security or customs audit and interrogation.
  • steps in the flowcharts of FIGS. 2-3 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in FIGS. 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • a question data generating device 400 including: a receiving module 402, an extracting module 404, a query module 406, a generating module 408, and a sending module 410, wherein:
  • the receiving module 402 is configured to receive user answer data sent by the terminal corresponding to the current question data.
  • the extraction module 404 is used to extract keywords from the user answer data based on a preset keyword extraction method.
  • the query module 406 is used to query matching entities and attributes in the constructed knowledge graph according to keywords.
  • the generating module 408 is used to determine target question data according to the queried entities and attributes.
  • the sending module 410 is used for sending target question data to the terminal for display.
  • the generating module 408 is also used to query related entities based on the attributes and attributes associated with the entity in the knowledge graph according to the searched entities and attributes; there are multiple related entities; and select targets from each related entity Related entities; generate target question data based on entities, attributes and target related entities.
  • the extraction module 404 is also used to pre-process user answer data; according to a matching method based on a preset keyword library, keywords are extracted from the pre-processed user answer data.
  • the above question data generating device 400 further includes: a knowledge graph construction module 412;
  • the knowledge graph construction module 412 is used for acquiring target data; identifying each target entity in the target data and the entity relationship between the target entities; according to each target entity and the corresponding entity relationship, constructing the knowledge graph according to a preset construction method.
  • the user answering data includes answering voice information; the extracting module 404 is also used to perform speech recognition on the answering voice information to obtain corresponding answering voice text; extracting key from answering voice text based on a preset keyword extraction word.
  • the user answer data includes the user answer text and the user answer image; the above question data generating device 400 further includes: a determination module 414;
  • the determination module 414 is used to determine the answer score corresponding to the user's answer text, and determine the expression score corresponding to the user's answer image; determine the corresponding comprehensive score according to the answer score and the expression score;
  • the query module 406 executes a step of querying matching entities and attributes in the constructed knowledge graph according to keywords.
  • the user answer data corresponds to the user identification; the determination module 414 is also used to count the total number of question data corresponding to the user identification; when the total number of statistics reaches the preset total number, the generation of question data is stopped Process; determine the total score corresponding to the user ID according to the comprehensive score corresponding to the question data corresponding to the user ID; push the total score to the terminal for display.
  • Each module in the above question data generating device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device in one of the embodiments, the computer device may be a server, and the internal structure diagram thereof may be as shown in FIG. 6.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store the constructed knowledge graph.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors implement any one of the applications The steps of the method for generating question data provided in the embodiments.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to implement any one of the embodiments of the present application The steps of the provided question data generation method.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

一种提问数据生成方法,包括:接收终端发送的与当前提问数据对应的用户答题数据;基于预设关键词提取方式从所述用户答题数据中提取关键词;根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;根据查询到的实体和属性确定目标提问数据;将所述目标提问数据发送至所述终端进行展示。

Description

提问数据生成方法、装置、计算机设备和存储介质
本申请要求于2018年10月16日提交中国专利局,申请号为2018112046061,申请名称为“提问数据生成方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种提问数据生成方法、装置、计算机设备和存储介质。
背景技术
传统信贷场景下的问答以人工为主,随着计算机技术的不断发展,逐渐出现了系统性的人机问答,由终端代替业务员向用户提问,降低了针对业务员的行业培训成本。然而,系统性人机问答通常基于预先配置的固定题库模板对应确定提问数据,即信贷面审过程中,服务器根据固定题库模板对应生成一套固定的提问数据,并通过终端将该固定的提问数据依次进行展示。
然而,发明人意识到,目前,信贷面审过程中的提问数据是固定的,不具备动态性,可能导致信贷面审题库的泄露,从而存在信贷面审题库的安全性低的问题。
发明内容
根据本申请公开的各种实施例,提供一种提问数据生成方法、装置、计算机设备和存储介质。
一种提问数据生成方法包括:
接收终端发送的与当前提问数据对应的用户答题数据;
基于预设关键词提取方式从所述用户答题数据中提取关键词;
根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
根据查询到的实体和属性确定目标提问数据;及
将所述目标提问数据发送至所述终端进行展示。
一种提问数据生成装置包括:
接收模块,用于接收终端发送的与当前提问数据对应的用户答题数据;
提取模块,用于基于预设关键词提取方式从所述用户答题数据中提取关键词;
查询模块,用于根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
生成模块,用于根据查询到的实体和属性确定目标提问数据;及
发送模块,用于将所述目标提问数据发送至所述终端进行展示。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读 指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现本申请任意一个实施例中提供的提问数据生成方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现本申请任意一个实施例中提供的提问数据生成方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中提问数据生成方法的应用场景图。
图2为根据一个或多个实施例中提问数据生成方法的流程示意图。
图3为另一个实施例中提问数据生成方法的流程示意图。
图4为根据一个或多个实施例中提问数据生成装置的框图。
图5为另一个实施例中提问数据生成装置的框图。
图6为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的提问数据生成方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104通过网络进行通信。服务器104接收终端102针对当前提问数据对应反馈的用户答题数据,从该用户答题数据中提取关键词,根据所提取的关键词在以构建的知识图谱中查询与该关键词相匹配的实体和属性,根据所查询到的实体和属性生成目标提问数据,并将该目标提问数据发送至终端102进行展示。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种提问数据生成方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
S202,接收终端发送的与当前提问数据对应的用户答题数据。
当前提问数据是当前向用户提问的题目数据,即当前问答数据处理过程所对应的提问数据。当前提问数据具体可以是指离当前时间最近的提问数据,也即最新的提问数据。提 问数据具体可以是通过显示屏进行展示的文本数据,也可以是通过语音播报的方式展示的语音信息。当前提问数据与终端当前反馈的用户答题数据和用户答题影像想对应。用户答题数据是用户针对当前提问数据对应反馈的答题数据。用户答题数据具体可以是用户在终端手动录入的文本形式的答题数据,也可以是用户回答当前提问数据时终端对应采集到的语音形式的答题信息。
具体地,终端接收到服务器发送的当前提问数据时,通过语音播报或显示屏显示的方式向用户展示所接收到的当前提问数据,并实时检测用户针对该当前提问数据对应反馈用户答题数据。当检测到用户针对当前提问数据对应反馈的用户答题数据时,终端将所检测到的用户答题数据发送至服务器。
在其中一个实施例中,服务器接收问答指令,根据所接收到的问答指令获取相应的当前提问数据,将所获取到的当前提问数据发送至终端。服务器根据问答指令从预存储的预设题库中选择当前提问数据。服务器也可根据问答指令从预存储的提问素材集中获取相应的提问素材,根据所获取到的提问素材生成相应的当前提问数据。服务器还可根据问答指令从已构建的知识图谱中选择实体和相应的属性,并根据所选择的实体和属性对应生成当前数据。预设题库是由预先生成的多个提问数据组成的提问数据集。提问素材集是由多个提问素材组成的素材集合。提问素材比如身份证号、籍贯、现居地等。
在其中一个实施例中,终端向用户展示当前提问数据时,实时检测用户针对当前提问数据的答题操作,根据所检测到的答题操作获取相应的用户答题数据。答题操作比如用户在终端的录入操作,或用户的口述操作。
在其中一个实施例中,当接收到的用户答题数据为语音形式的答题数据时,服务器对该用户答题数据进行语音识别,获得与该用户答题数据对应的答题语音文本,根据所提取到的语音答题文本执行下述提问数据生成方法的相关步骤。语音识别可基于现有的语音识别技术,在此不再赘述。
S204,基于预设关键词提取方式从用户答题数据中提取关键词。
预设关键词提取方式是预先设定的用于提取用户答题数据中的关键词的方式。预设关键词提取方式可以是基于预设关键词库进行关键词匹配的方式;也可以是对用户答题数据进行分词,并按照预设筛选条件从分词获得的多个词中筛选关键词;还可以是基于已训练的关键词提取模型来进行关键词提取,关键词提取模型具体可分为基于有监督的、半监督的和无监督的训练方式训练获得的模型。
具体地,服务器基于预设关键词提取方式对所接收到的用户答题数据进行关键词提取,得到该用户答题数据对应的关键词。服务器对用户答题数据进行分词,将分词得到的各个词分别与预设关键词库进行关键词匹配,并将与预设关键词库匹配成功的词作为关键词。预设关键词库是由预先设定的多个关键词组成的关键词词集。
在其中一个实施例中,服务器对用户答题数据进行分词,得到分词结果,然后对分词结果中的各个词计算特征权重,并根据特征权重对分词结果中的各个词进行排序,进而根 据排序结果选取关键词。具体地,服务器对分词结果中的各个词计算TF-IDF权重,根据TF-IDF权重对各个词进行降序排列,并选取排序靠前的预设数量的词作为关键词。各个词的TF-IDF权重计算过程包括:首先,计算分词结果中的各个词的词频,可参考如下公式进行计算:
词频TF=某个词在文档中出现的次数/文档的总词数;
然后,计算各个词的逆文档词频IDF,可参考如下公式进行计算:
逆文档词频IDF=log(语料库的文档总数/(包含该词的文档数+1));
最后,计算词频TF与逆文档词频IDF的乘积得到TF-IDF权重。
在其中一个实施例中,服务器将用户答题数据输入预先训练好的关键词提取模型进行预测,获取与该用户答题数据对应的关键词。关键词提取模型是根据预先获取的训练样本集进行模型训练获得的模型。
S206,根据关键词在已构建的知识图谱中查询相匹配的实体和属性。
知识图谱是一张包含各种实体或概念及其关系的语义网络图。知识图谱中的节点为与实体对应的实体节点,与任一实体节点连接的边表示该任一实体节点所对应的实体的属性,该与任一实体节点连接的边的另一端所连接的实体节点所对应的实体为该任一实体节点对应的属性值。比如,该任一实体节点对应的实体为XX集团总部,与该任一实体节点连接的边表示地址,则该与XX集团总部连接的边的另一端所连接的实体节点所对应的实体为深圳市福田区。知识图谱中的实体可包括概念、人名、地名和企业机构名称等。知识图谱中的属性可为该属性对应的边所连接的两个实体之间的关系(实体关系),或者相应实体本身所具备的特性。
可以理解的是,知识图谱具体可以是包含大量三元组的知识库。三元组包括实体、实体关系,以及通过实体关系与该实体相关联的关联实体。将各个三元组中的实体关系作为实体与相应关联实体之间的属性,根据属性在实体和对应的关联实体之间建立相应的有向连接,从而构建相应的知识图谱。
具体地,服务器将提取到的关键词分别与已构建的知识图谱中的各个实体和属性进行匹配,将与关键词匹配成功的实体和属性确定为与该关键词相匹配的实体和属性。从用户答题数据中提取到的关键词有多个,服务器将所提取到的各个关键词分别与知识图谱中的各个实体进行匹配,将匹配成功的实体确定为根据关键词对应查询到的实体。类似地,服务器将所提取到的各个关键词分别与知识图谱中的各个属性进行匹配,将匹配成功的属性确定为根据关键词对应查询到的属性。
在其中一个实施例中,服务器可将所提取到的关键词的同义词分别与知识图谱中的各个属性进行匹配,将匹配成功的属性确定为与该关键词相匹配的属性。服务器也可将所查询到的关键词分别与知识图谱中各个属性的关键词进行匹配,并将匹配成功的属性确定为与该关键词相匹配的属性。
在其中一个实施例中,服务器将所提取到的关键词与知识图谱中的各实体进行匹配,以将该关键词对应到知识图谱的某一个实体上,从而在知识图谱中找到与该关键词相匹配的实体。服务器通过NLP(Natural Language Processing,自然语言处理)技术理解用户答题数据所表达的语义,并将所理解的语义与知识图谱中的属性进行匹配,以识别出与用户答题数据对应的属性,并将该属性确定为与相应关键词相匹配的属性。服务器也可将当前提问数据所表达的语义与知识图谱中的属性进行匹配,以确定与所提取到的关键词相匹配的属性。
在其中一个实施例中,从用户答题数据中提取到的关键词可理解为从该用户答题数据中识别出的命名实体,该用户答题数据所表达的语义可理解为该命名实体所对应的属性。命名实体包括人名、地名和企业机构名称等。当提取到的关键词有多个时,服务器可根据该多个关键词对应确定命名实体。
在其中一个实施例中,服务器根据所识别出的命名实体在知识图谱中查询相匹配的实体。服务器分别计算命名实体和知识图谱中各实体之间的匹配率,将匹配率达到预设阈值时的实体确定为与命名实体相匹配的实体。匹配率是指命名实体和相应实体之间的匹配程度。比如,命名实体为企业机构名称时,与该命名实体相匹配的实体可以是该企业机构名称,也可以是该企业机构名称对应的企业机构所对应的其他企业机构名称,如简称或全称等。命名实体为地名时,与该命名实体相匹配的实体可以是该地名,也可以是离该地名所对应的地址在预设范围内的其他地址所对应的其他地名。
举例说明,假设当前提问数据为“请问,你现居住地是哪里?”,对应获取到的用户答题数据为“我目前住在南山区科苑路58号”。则按照上述方式从该用户答题数据中识别出的命名实体为“南山区科苑路58号”,基于上述关键词提取方式或者语义理解方式可对应确定该命名实体所对应的属性为“地址”。服务器从知识图谱中查询到的与命名实体相匹配的实体,可以是“南山区科苑路58号”,或者“南山区科苑路59号”等。
在其中一个实施例中,在当前提问数据为是非问题时,服务器可将用户答题数据的内容文本确定为该用户答题数据所对应的关键词。进一步地,服务器可根据该当前提问数据和从用户答题数据中对应提取到的关键词,对应确定该用户答题数据所表示的语义,并根据该语义从已构建的知识图谱中查询与该语义相匹配的实体和属性。举例说明,假设当前提问数据为“南山区科苑路59号有个XX银行,是吗?”,对应的用户答题数据为“是的”,服务器根据该用户答题数据所表示的语义确定的实体和属性可分别为“南山区科苑路57号”和“地址”。
S208,根据查询到的实体和属性确定目标提问数据。
目标提问数据是指继当前提问数据之后,再一次向用户发起提问时的题目数据。目标提问数据具体可以是可通过显示屏进行展示的文本数据,也可以是可通过语音播报的方式展示的语音信息。
具体地,服务器根据所查询到的实体和属性,在相应知识图谱中查询通过该属性与该 实体相关联的关联实体。服务器根据所查询到的实体、属性和相应的关联实体生成相应的目标提问数据。
举例说明,基于上述例子,假设根据识别出的命名实体在知识图谱中对应查询到的实体为“南山区科苑路59号”,根据该命名实体所对应的属性“地址”,在知识图谱中对应查询到的关联实体为“XX银行”,则对应生成的目标提问数据可以是“南山区科苑路59号有个XX银行,是吗?”,或者“南山区科苑路58号附近有个XX银行,是吗?”
在其中一个实施例中,服务器根据从用户答题数据中提取到的关键词,在已构建的知识图谱中查询与该关键词相匹配的实体,并根据该知识图谱对应确定与该实体相连接的一条或多条边。该一条或多条边表示该实体的属性,服务器可从该一个或多个属性中选择一个属性,并根据所选择的属性和相应实体生成目标提问数据。在其中一个实施例中,服务器在知识图谱中对应查询通过该一条或多条边与该实体关联的一个或多个关联实体。该一个或多个关联实体为属性所对应的属性值。服务器从该一个或多个属性中选择一个属性,以及从相应的一个或多个关联实体中选择与所选择的属性相对应的关联实体,并根据所选择的属性、关联实体和相应的实体对应生成目标提问数据。
举例说明,假设当前提问数据为“请问,你在哪个公司上班?”,对应获取到的用户答题数据为“我在XX集团上班”。则按照上述方式根据该用户答题数据中的关键词对应查询到的实体为“XX集团”,基于该实体在知识图谱中对应查询到的属性可以是“地址”、“法人”、“成立时间”等。按照上述方式对应生成的目标提问数据可以是“XX集团的法人是谁?”,或者“XX集团的法人是张三,是吗?”等。
S210,将目标提问数据发送至终端进行展示。
具体地,服务器将根据所查询到的实体和属性对应确定的目标提问数据通过网络发送至终端。终端将所接收到的目标提问数据通过显示屏显示或语音播报的方式展示给相应用户。
在其中一个实施例中,终端通过语音播报或显示屏显示的方式向用户展示所接收到的目标提问数据时,实时检测用户针对该目标提问数据对应反馈用户答题数据,将所检测到的用户答题数据发送至服务器,以使得服务器根据所接收到的用户答题数据继续执行上述提问数据生成方法的相关步骤。
在其中一个实施例中,服务器将目标提问数据发送至用户终端,以通过该用户终端将目标提问数据展示给相应用户。服务器还可将目标提问数据发送至业务员终端进行展示。
上述提问数据生成方法,接收终端针对当前提问数据对应反馈的用户答题数据,基于预设关键词提取方式从该用户答题数据中提取相应的关键词,并根据所提取的关键词在已构建的知识图谱中对应查询与关键词相匹配的实体和属性,进而根据所查询到的实体和属性生成相应的目标提问数据,提高了目标提问数据的精准性。根据用户答题数据和已构建的知识图谱动态确定下一次提问时的目标提问数据,以保证目标提问数据的不确定性, 能够有效避免由各提问数据构成的题库泄露的问题,从而提高了题库的安全性。
在其中一个实施例中,步骤S208包括:根据查询到的实体和属性,在知识图谱中查询基于属性与实体相关联的关联实体;关联实体有多个;从各关联实体中选取目标关联实体;根据实体、属性和目标关联实体生成目标提问数据。
关联实体是在知识图谱中通过边(属性)与另一实体相关联的实体。可以理解的是,实体和关联实体均为知识图谱中的实体节点,实体和关联实体通过属性相关联,也即属性用于表现实体和相应关联实体之间的实体关系。通过属性相关联的两个实体可互为关联实体。关联实体可以是相应实体所对应的属性值,也可以是通过属性所表征的实体关系与相应实体相关联的另一实体。
具体地,服务器根据所查询到的实体和属性,在知识图谱中查询到多个通过该属性与该实体相关联的关联实体。服务器从所查询到的多个关联实体中选取目标关联实体,并根据该目标关联实体,以及相应的实体和属性对应生成目标提问数据。
在其中一个实施例中,服务器可从所查询到的多个关联实体中随机选择一个关联实体作为目标关联实体。服务器也可获取与所接收到的用户答题数据对应的用户标识,统计所查询到的关联实体的总数量,并对该用户标识按照统计的总数量计算哈希值,进而根据计算所得的哈希值从该多个关联实体中选择目标关联实体。服务器还可通过轮询的方式依次从所查询到的多个关联实体中选择一个关联实体作为目标关联实体。
上述实施例中,从所查询到的多个关联实体中选择目标关联实体,进而根据目标关联实体以及相应的实体和属性对应生成目标提问数据,进一步提高了目标提问数据的不确定性,从而进一步提高了题库的安全性。
在其中一个实施例中,步骤S204包括:对用户答题数据进行预处理;按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
预处理包括分词处理和去停用词处理。分词处理是将文本形式的用户答题数据划分成一个一个的词的过程。分词处理对应的分词算法有多种,比如,基于字符串匹配的分词算法、基于语义分析的分词算法或者基于统计的分词算法等。基于字符串匹配的分词算法如正向最大匹配算法、逆向最大匹配算法、最少切分算法或者双向最大匹配算法。已训练好的分词模型也可用于对用户答题数据进行分词处理。分词模型具体可以是隐马尔可夫模型或CRF(conditional random field algorithm,条件随机场算法)模型等。停用词是指在信息检索中,为节省存储空间和提高检索效率,在处理自然语言数据(或文本)之前或之后会自动过滤掉的某些字或词,比如语气助词、客套词、介词或连接词等,比如的、吗、呢、啊等。预设关键词库是由预先设置的多个预设关键词构成的关键词词集。预设关键词库中还可包括各个预设关键词的同义词。
具体地,服务器对所接收到的用户答题数据进行分词处理,得到分词结果,并对分词结果进行去停用词处理,获得相应的候选关键词集。服务器可将分词结果中的各个词分别与预先构建的停用词库进行匹配,将与停用词库匹配成功的词确定为停用词,并从分词结 果中去除该停用词。服务器将去除停用词后的分词结果中的各个词作为候选关键词,获得相应的候选关键词集。进一步地,服务器将该候选关键词集中的各个候选关键词,分别与预设关键词库中的预设关键词,以及预设关键词的同义词进行匹配。当匹配成功时,表明匹配成功的候选关键词为预设关键词库中的预设关键词或该预设关键词的同义词,服务器将该候选关键词确定为从预处理后的用户答题数据中提取的关键词。
在其中一个实施例中,服务器采用NLP(Natural Language Processing自然语言处理)技术对用户答题数据进行处理,并对处理后的用户答题数据进行语义分析与理解,获得相应的关键词。在其中一个实施例中,服务器对用户答题数据进行预处理,并按照基于词典和正则的匹配方式从预处理后的用户答题数据中提取关键词。具体地,服务器将预处理获得的各个候选关键词分别与预设词典中的预设关键词进行匹配,并将匹配成功的候选关键词确定为所提取到的关键词。
上述实施例中,基于预设关键词库提取关键词,使得从用户答题数据中所提取到的关键词更加准确。
在其中一个实施例中,步骤S206之前,上述提问数据生成方法还包括:获取目标数据;识别目标数据中的各目标实体和目标实体之间的实体关系;根据各目标实体和相应的实体关系,按照预设构建方式构建知识图谱。
目标数据是用于构建知识图谱的原始数据。目标数据具体可以是一个或多个指定领域的原始数据,指定领域比如金融领域和制造业领域等。以金融领域的原始数据为例,金融领域可包括银行、证券、保险、信托、信贷和基金等行业,每个行业可包括多个企业机构,每个企业机构对应有相应的企业原始数据,企业原始数据可包括内部数据和外部数据等,内部数据是存储在企业本地数据库的数据,比如运营数据和业务数据等,外部数据是可通过第三方平台从其他存储空间爬取的数据,比如工商数据、银监数据、人民银行数据和公司年报等。集团企业的企业原始数据还可包括各个子公司的原始数据,比如投资关系、成立时间和地址等。对于包含银行的企业原始数据,还可包括各个支行的地址等。
具体地,服务器接收到知识图谱的构建指令时,根据所接收到的构建指令从各种渠道获取已被公开的原始数据,并从本地数据库中获取预存储的且未被公开的原始数据,获得相应的目标数据。服务器对所获取到的目标数据进行数据加工,进而识别出该目标数据中的各个目标实体和各个目标实体之间的实体关系。服务器将任意两个目标实体之间的实体关系作为该任意两个目标实体之间的属性,并根据该属性在该任意两个实体之间建立相应的有向连接。对于从目标数据中识别出的各个目标实体和相应的实体关系,服务器均执行上述步骤,从而构建与该目标数据对应的知识图谱。
在其中一个实施例中,服务器可使用数据仓库技术对所获取到的目标数据进行数据加工。服务器将按照上述方式构建的知识图谱中的关系型数据保存到图数据库中。该图数据库可以是Neo4j(一个高性能的NOSQL图形数据库)数据库。
在其中一个实施例中,目标数据包括多个指定领域的原始数据。对于每个指定领域的 原始数据,服务器可分别以领域作为知识图谱的顶点(核心节点),根据从目标数据中识别出的实体关系对应确定与该核心节点相连接的关联节点。类似地,服务器根据实体关系分别确定各个关联节点所连接的关联节点,并将各个节点之间的实体关系作为属性,根据所确定的各个节点和相应的属性构建知识图谱。核心节点和关联节点均为知识图谱中的节点(实体节点)。在其中一个实施例中,服务器可按照上述方式分别构建各个指定领域所对应的知识图谱,并根据任意两个指定领域之间存在交集的实体或属性,对该任意两个指定领域各自对应的知识图谱进行融合,获得跨领域的知识图谱。任意两个指定领域之间存在交集的实体或属性是指该任意两个指定领域各自对应的知识图谱中均包含的实体或属性。
上述实施例中,根据目标数据识别相应的实体和实体关系,进而根据识别出的实体和实体关系构建相应的知识图谱,以便于在动态生成提问数据时可以直接使用该已构建的知识图谱,提高提问数据的生成效率。
在其中一个实施例中,用户答题数据包括答题语音信息;步骤S204包括:对答题语音信息进行语音识别获得相应的答题语音文本;基于预设关键词提取方式从答题语音文本中提取关键词。
答题语音信息是用户针对当前提问数据进行答题操作时对应采集到的语音信息。答题语音文本是答题语音信息中所携带的语音文本内容。
具体地,服务器从所接收到的用户答题数据中获取相应的答题语音信息,并对所获取到的答题语音信息进行语音识别,以从该答题语音信息中提取相应的答题语音文本。服务器对提取出的答题语音文本进行分词,并将分词得到的各个词分别与预设关键词库进行关键词匹配,将与预设关键词库匹配成功的词作为从答题语音文本中提取到的关键词。
在其中一个实施例中,服务器通过预先训练好的语音识别模型从答题语音信息中提取相应的答题语音文本,并通过预先训练好的关键词提取模型从该答题语音文本中提取相应的关键词。可以理解的是,服务器还可基于其他现有技术中的语音识别技术从答题语音信息中提取相应的答题语音文本,类似地,还可基于其他现有技术中的关键词提取技术对应提取关键词,在此不再赘述。
在其中一个实施例中,服务器对答题语音信息进行声纹识别,以提取相应的目标声纹特征,并根据该目标声纹特征对针对当前提问数据进行答题操作的用户进行身份认证。当身份认证通过时,服务器继续执行对答题语音信息进行语音识别获得相应的答题语音文本的步骤。
上述实施例中,当用户答题数据中包括答题语音信息时,基于语音识别技术识别出相应的答题语音文本,进而从该答题语音文本中对应提取关键词,提高了用户答题数据的处理效率,从而提高了提问数据的生成效率。
在其中一个实施例中,用户答题数据包括用户答题文本和用户答题影像;步骤S206之前,上述提问数据生成方法还包括:确定与用户答题文本对应的答题分值,并确定与用 户答题影像对应的表情分值;根据答题分值和表情分值确定相应的综合分值;当综合分值低于预设分值阈值时,执行根据关键词在已构建的知识图谱中查询相匹配的实体和属性的步骤。
用户答题文本是文本形式的答题数据。终端可根据用户的手动录入操作获取相应的用户答题文本,还可将从用户进行答题操作时对应采集到的语音信息中识别出的语音内容作为用户答题文本。用户答题影像是在用户针对当前提问数据对应反馈答题数据时对应采集到的影像信息。影像信息具体可以是由影像采集器采集到的以图像或视频等形式存在的信息。用户答题影像可包括但不限于是用户答题图像和用户答题视频。用户答题图像比如用户答题时对应采集到的包括用户人脸图像的图像。用户答题视频比如用户答题过程中对应采集到的视频,即用户进行答题操作过程中采集到的视频。影像采集器可以是摄像头,摄像头可以是配置于终端的摄像头,也可以是与终端点对点连接的独立部件。用户答题影像与用户答题文本相对应。
具体地,服务器从所接收到的用户答题数据中获取相应的用户答题文本和用户答题影像。服务器按照预设答题分值确定方式确定与用户答题文本对应的答题分值,并按照预设表情分值确定方式确定与用户答题影像对应的表情分值。服务器根据所确定的答题分值和表情分值确定与相应用户答题数据对应的综合分值,并将所确定的综合分值与预设分值阈值进行比较。当综合分值低于预设分值阈值时,服务器执行根据从该用户答题数据所包含的用户答题文本中提前到的关键词,在已构建的知识图谱中查询相匹配的实体和属性的步骤。
预设答题分值确定方式可以是根据用户答题文本与相应预设答案之间的匹配率,以及当前提问数据所对应的预设答题分值对应确定答题分值;也可以是将用户答题文本中的关键词与预设关键词进行匹配,以确定相应的答题分值;还可以是将用户答题文本输入已训练的答题分值预测模型进行预测,获得相应的答题分值。预设表情分值确定方式可以是从用户答题影像中提取相应的用户微表情,基于微表情识别技术根据提取到的用户微表情确定与用户答题影像对应的表情分值。
在其中一个实施例中,服务器根据用户答题影像获取多帧用户人脸图像,基于微表情识别技术分别确定该多帧用户人脸图像各自对应的表情分值,并根据该多个表情分值确定与该用户答题影像对应的表情分值。在其中一个实施例中,当用户答题影像对应的表情分值低于预设表情分值阈值时,服务器执行根据相应的关键词在已构建的知识图谱中查询相匹配的实体和属性的步骤。
上述实施例中,当综合分值低于预设分值阈值时,表明用户答题文本的正确性存疑,则基于已构建的知识图谱选择与当前提问数据相关的实体和属性,并动态生成相应的目标提问数据,以使得动态生成的目标提问数据更准确。
在其中一个实施例中,用户答题数据与用户标识对应;上述提问数据生成方法还包括:统计与用户标识对应的提问数据总数量;当统计的总数量达到预设总数量时,停止提问数 据的生成流程;根据与用户标识对应的各提问数据所对应的综合分值,确定与用户标识对应的总分值;将总分值推送至终端进行展示。
具体地,服务器接收终端针对用户标识和当前提问数据对应反馈的用户答题数据,统计与该用户标识对应的提问数据总数量,并将统计的总数量与预设总数量进行比较。当统计的总数量达到预设总数量时,服务器则停止当前的提问数据生成流程。服务器按照上述方式确定当前提问数据所对应的综合分值,并获取与相应用户标识对应的、已有的各个提问数据所对应的综合分值,对当前确定的综合分值和对应获取到的多个综合分值进行直接求和或者加权求和,以确定与相应用户标识对应总分值,并将该总分值发送至终端进行展示。
在其中一个实施例中,服务器根据计算所得的总分值和预设总分值阈值生成相应的提示信息,并将该提示信息推送至终端。比如上述方法应用于信贷面审过程中的问答环节时,服务器可对应生成表示面审成功或失败的提示信息。面审是指在信贷业务办理过程中对用户身份进行审核。上述方法应用于面试环节时,服务器可对应生成表示面试成功或失败的提示信息。
在其中一个实施例中,服务器按照上述方式确定当前提问数据对应的综合分值,并根据该综合分值更新相应用户标识所对应的已有总分值。当更新后的总分值达到预设总分值阈值时,服务器停止当前的提问数据生成流程。
如图3所示,在其中一个实施例中,提供了一种提问数据生成方法,该方法具体包括以下步骤:
S302,接收终端发送的与当前提问数据对应的用户答题数据。
S304,对用户答题数据进行预处理。
S306,按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
S308,获取目标数据。
S310,识别目标数据中的各目标实体和目标实体之间的实体关系。
S312,根据各目标实体和相应的实体关系,按照预设构建方式构建知识图谱。
S314,根据关键词在已构建的知识图谱中查询相匹配的实体和属性。
S316,根据查询到的实体和属性,在知识图谱中查询基于属性与实体相关联的关联实体;关联实体有多个。
S318,从各关联实体中选取目标关联实体。
S320,根据实体、属性和目标关联实体生成目标提问数据。
S322,将目标提问数据发送至终端进行展示。
上述实施例中,从与当前提问数据对应的用户答题数据中提取关键词,并基于已构建的知识图谱根据关键词查询相应匹配的实体和属性,对应确定关联实体,进而根据实体、属性和相应的关联实体动态生成目标提问数据,并发送至终端进行展示,提高了目标提问数据的不确定性,从而提高了题库的安全性。
在其中一个实施例中,上述提问数据生成方法可应用于信贷面审的问答环节,也可以应用于面试环节,还可以应用于公安或海关的稽查审问的问答环节等。
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图4所示,提供了一种提问数据生成装置400,包括:接收模块402、提取模块404、查询模块406、生成模块408和发送模块410,其中:
接收模块402,用于接收终端发送的与当前提问数据对应的用户答题数据。
提取模块404,用于基于预设关键词提取方式从用户答题数据中提取关键词。
查询模块406,用于根据关键词在已构建的知识图谱中查询相匹配的实体和属性。
生成模块408,用于根据查询到的实体和属性确定目标提问数据。
发送模块410,用于将目标提问数据发送至终端进行展示。
在其中一个实施例中,生成模块408,还用于根据查询到的实体和属性,在知识图谱中查询基于属性与实体相关联的关联实体;关联实体有多个;从各关联实体中选取目标关联实体;根据实体、属性和目标关联实体生成目标提问数据。
在其中一个实施例中,提取模块404,还用于对用户答题数据进行预处理;按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
如图5所示,在其中一个实施例中,上述提问数据生成装置400,还包括:知识图谱构建模块412;
知识图谱构建模块412,用于获取目标数据;识别目标数据中的各目标实体和目标实体之间的实体关系;根据各目标实体和相应的实体关系,按照预设构建方式构建知识图谱。
在其中一个实施例中,用户答题数据包括答题语音信息;提取模块404,还用于对答题语音信息进行语音识别获得相应的答题语音文本;基于预设关键词提取方式从答题语音文本中提取关键词。
在其中一个实施例中,用户答题数据包括用户答题文本和用户答题影像;上述提问数据生成装置400,还包括:确定模块414;
确定模块414,用于确定与用户答题文本对应的答题分值,并确定与用户答题影像对应的表情分值;根据答题分值和表情分值确定相应的综合分值;当综合分值低于预设分值阈值时,使得查询模块406执行根据关键词在已构建的知识图谱中查询相匹配的实体和属性的步骤。
在其中一个实施例中,用户答题数据与用户标识对应;确定模块414,还用于统计与用户标识对应的提问数据总数量;当统计的总数量达到预设总数量时,停止提问数据的生成流程;根据与用户标识对应的各提问数据所对应的综合分值,确定与用户标识对应的总分值;将总分值推送至终端进行展示。
关于提问数据生成装置的具体限定可以参见上文中对于提问数据生成方法的限定,在此不再赘述。上述提问数据生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储已构建的知识图谱。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种提问数据生成方法。
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的提问数据生成方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的提问数据生成方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为 说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种提问数据生成方法,包括:
    接收终端发送的与当前提问数据对应的用户答题数据;
    基于预设关键词提取方式从所述用户答题数据中提取关键词;
    根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
    根据查询到的实体和属性确定目标提问数据;及
    将所述目标提问数据发送至所述终端进行展示。
  2. 根据权利要求1所述的方法,其特征在于,所述根据查询到的实体和属性确定目标提问数据,包括:
    根据查询到的实体和属性,在所述知识图谱中查询基于所述属性与所述实体相关联的关联实体;所述关联实体有多个;
    从各所述关联实体中选取目标关联实体;及
    根据所述实体、所述属性和所述目标关联实体生成目标提问数据。
  3. 根据权利要求1所述的方法,其特征在于,所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述用户答题数据进行预处理;及
    按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性之前,所述方法还包括:
    获取目标数据;
    识别所述目标数据中的各目标实体和所述目标实体之间的实体关系;及
    根据各所述目标实体和相应的所述实体关系,按照预设构建方式构建知识图谱。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述用户答题数据包括答题语音信息;所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述答题语音信息进行语音识别获得相应的答题语音文本;及
    基于预设关键词提取方式从所述答题语音文本中提取关键词。
  6. 根据权利要求1至4任意一项所述的方法,其特征在于,所述用户答题数据包括用户答题文本和用户答题影像;所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性之前,所述方法还包括:
    确定与所述用户答题文本对应的答题分值,并确定与所述用户答题影像对应的表情分值;
    根据所述答题分值和所述表情分值确定相应的综合分值;及
    当所述综合分值低于预设分值阈值时,执行所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性的步骤。
  7. 根据权利要求6所述的方法,其特征在于,所述用户答题数据与用户标识对应; 所述方法还包括:
    统计与所述用户标识对应的提问数据总数量;
    当统计的所述总数量达到预设总数量时,停止提问数据的生成流程;
    根据与所述用户标识对应的各提问数据所对应的综合分值,确定与所述用户标识对应的总分值;及
    将所述总分值推送至所述终端进行展示。
  8. 一种提问数据生成装置,包括:
    接收模块,用于接收终端发送的与当前提问数据对应的用户答题数据;
    提取模块,用于基于预设关键词提取方式从所述用户答题数据中提取关键词;
    查询模块,用于根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
    生成模块,用于根据查询到的实体和属性确定目标提问数据;及
    发送模块,用于将所述目标提问数据发送至所述终端进行展示。
  9. 一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收终端发送的与当前提问数据对应的用户答题数据;
    基于预设关键词提取方式从所述用户答题数据中提取关键词;
    根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
    根据查询到的实体和属性确定目标提问数据;及
    将所述目标提问数据发送至所述终端进行展示。
  10. 根据权利要求9所述的计算机设备,其特征在于,所述根据查询到的实体和属性确定目标提问数据,包括:
    根据查询到的实体和属性,在所述知识图谱中查询基于所述属性与所述实体相关联的关联实体;所述关联实体有多个;
    从各所述关联实体中选取目标关联实体;及
    根据所述实体、所述属性和所述目标关联实体生成目标提问数据。
  11. 根据权利要求9所述的计算机设备,其特征在于,所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述用户答题数据进行预处理;及
    按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
  12. 根据权利要求9所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性之前,还执行以下步骤:
    获取目标数据;
    识别所述目标数据中的各目标实体和所述目标实体之间的实体关系;及
    根据各所述目标实体和相应的所述实体关系,按照预设构建方式构建知识图谱。
  13. 根据权利要求9至12任意一项所述的计算机设备,其特征在于,所述用户答题数据包括答题语音信息;所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述答题语音信息进行语音识别获得相应的答题语音文本;及
    基于预设关键词提取方式从所述答题语音文本中提取关键词。
  14. 根据权利要求9至12任意一项所述的计算机设备,其特征在于,所述用户答题数据包括用户答题文本和用户答题影像;所述计算机可读指令被所述处理器执行时,使得所述处理器在执行所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性之前,还执行以下步骤:
    确定与所述用户答题文本对应的答题分值,并确定与所述用户答题影像对应的表情分值;
    根据所述答题分值和所述表情分值确定相应的综合分值;及
    当所述综合分值低于预设分值阈值时,执行所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性的步骤。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述用户答题数据与用户标识对应;所述处理器执行所述计算机可读指令时还执行以下步骤:
    统计与所述用户标识对应的提问数据总数量;
    当统计的所述总数量达到预设总数量时,停止提问数据的生成流程;
    根据与所述用户标识对应的各提问数据所对应的综合分值,确定与所述用户标识对应的总分值;及
    将所述总分值推送至所述终端进行展示。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收终端发送的与当前提问数据对应的用户答题数据;
    基于预设关键词提取方式从所述用户答题数据中提取关键词;
    根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性;
    根据查询到的实体和属性确定目标提问数据;及
    将所述目标提问数据发送至所述终端进行展示。
  17. 根据权利要求16所述的存储介质,其特征在于,所述根据查询到的实体和属性确定目标提问数据,包括:
    根据查询到的实体和属性,在所述知识图谱中查询基于所述属性与所述实体相关联的关联实体;所述关联实体有多个;
    从各所述关联实体中选取目标关联实体;及
    根据所述实体、所述属性和所述目标关联实体生成目标提问数据。
  18. 根据权利要求16所述的存储介质,其特征在于,所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述用户答题数据进行预处理;及
    按照基于预设关键词库的匹配方式,从预处理后的用户答题数据中提取关键词。
  19. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行所述根据所述关键词在已构建的知识图谱中查询相匹配的实体和属性之前,还执行以下步骤:
    获取目标数据;
    识别所述目标数据中的各目标实体和所述目标实体之间的实体关系;及
    根据各所述目标实体和相应的所述实体关系,按照预设构建方式构建知识图谱。
  20. 根据权利要求16至19任意一项所述的存储介质,其特征在于,所述用户答题数据包括答题语音信息;所述基于预设关键词提取方式从所述用户答题数据中提取关键词,包括:
    对所述答题语音信息进行语音识别获得相应的答题语音文本;及
    基于预设关键词提取方式从所述答题语音文本中提取关键词。
PCT/CN2019/070844 2018-10-16 2019-01-08 提问数据生成方法、装置、计算机设备和存储介质 WO2020077896A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG11201913916QA SG11201913916QA (en) 2018-10-16 2019-01-08 Question data generation method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811204606.1A CN109543007A (zh) 2018-10-16 2018-10-16 提问数据生成方法、装置、计算机设备和存储介质
CN201811204606.1 2018-10-16

Publications (1)

Publication Number Publication Date
WO2020077896A1 true WO2020077896A1 (zh) 2020-04-23

Family

ID=65844063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070844 WO2020077896A1 (zh) 2018-10-16 2019-01-08 提问数据生成方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
CN (1) CN109543007A (zh)
SG (1) SG11201913916QA (zh)
WO (1) WO2020077896A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135888B (zh) * 2019-04-12 2023-08-08 平安科技(深圳)有限公司 产品信息推送方法、装置、计算机设备和存储介质
CN110134795A (zh) * 2019-04-17 2019-08-16 深圳壹账通智能科技有限公司 生成验证问题组的方法、装置、计算机设备和存储介质
CN110135800A (zh) * 2019-04-23 2019-08-16 南京葡萄诚信息科技有限公司 一种人工智能视频面试方法及系统
CN110321408B (zh) * 2019-05-30 2023-07-14 广东省智湾汇科技有限公司 基于知识图谱的搜索方法、装置、计算机设备和存储介质
CN110427477B (zh) * 2019-08-08 2021-09-10 思必驰科技股份有限公司 用于故事机的启发式提问方法和装置
CN110705310B (zh) * 2019-09-20 2023-07-18 北京金山数字娱乐科技有限公司 一种文章生成的方法和装置
CN110928984A (zh) * 2019-09-30 2020-03-27 珠海格力电器股份有限公司 一种知识图谱的构建方法、装置、终端及存储介质
CN110738061A (zh) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 古诗词生成方法、装置、设备及存储介质
CN111626052A (zh) * 2020-04-28 2020-09-04 北京明亿科技有限公司 基于哈希词典的接处警文本物品名称提取方法和装置
CN112668384A (zh) * 2020-08-07 2021-04-16 深圳市唯特视科技有限公司 一种知识图谱构建方法、系统、电子设备及存储介质
CN116235165A (zh) * 2020-08-14 2023-06-06 西门子股份公司 一种智能提供推荐信息的方法和装置
CN112084773A (zh) * 2020-08-21 2020-12-15 国网湖北省电力有限公司电力科学研究院 一种基于词库双向最大匹配法的电网停电地址匹配方法
CN112037029B (zh) * 2020-09-01 2024-02-27 中国银行股份有限公司 银行信贷审批问题自动生成方法及装置
CN111933128B (zh) * 2020-09-21 2021-01-12 北京维数统计事务所有限公司 调查问卷的题库的处理方法、装置、电子设备
CN112966119B (zh) * 2021-02-25 2022-11-25 青岛海信网络科技股份有限公司 一种信息获取方法、设备及介质
CN114610860B (zh) * 2022-05-07 2022-09-27 荣耀终端有限公司 一种问答方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261690A (zh) * 2008-04-18 2008-09-10 北京百问百答网络技术有限公司 一种问题自动生成的系统及其方法
US20140280087A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Results of Question and Answer Systems
CN104978396A (zh) * 2015-06-02 2015-10-14 百度在线网络技术(北京)有限公司 一种基于知识库的问答题目生成方法和装置
CN108519998A (zh) * 2018-03-07 2018-09-11 北京云知声信息技术有限公司 基于知识图谱的问题引导方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095195B (zh) * 2015-07-03 2018-09-18 北京京东尚科信息技术有限公司 基于知识图谱的人机问答方法和系统
CN107680019B (zh) * 2017-09-30 2021-09-24 百度在线网络技术(北京)有限公司 一种考试方案的实现方法、装置、设备及存储介质
CN107945015B (zh) * 2018-01-12 2021-05-11 深圳壹账通智能科技有限公司 人机问答审核方法、装置、设备及计算机可读存储介质
CN108345690B (zh) * 2018-03-09 2020-11-13 广州杰赛科技股份有限公司 智能问答方法与系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261690A (zh) * 2008-04-18 2008-09-10 北京百问百答网络技术有限公司 一种问题自动生成的系统及其方法
US20140280087A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Results of Question and Answer Systems
CN104978396A (zh) * 2015-06-02 2015-10-14 百度在线网络技术(北京)有限公司 一种基于知识库的问答题目生成方法和装置
CN108519998A (zh) * 2018-03-07 2018-09-11 北京云知声信息技术有限公司 基于知识图谱的问题引导方法及装置

Also Published As

Publication number Publication date
SG11201913916QA (en) 2020-05-28
CN109543007A (zh) 2019-03-29

Similar Documents

Publication Publication Date Title
WO2020077896A1 (zh) 提问数据生成方法、装置、计算机设备和存储介质
WO2021004333A1 (zh) 基于知识图谱的事件处理方法、装置、设备和存储介质
WO2020057022A1 (zh) 关联推荐方法、装置、计算机设备和存储介质
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
WO2020077895A1 (zh) 签约意向判断方法、装置、计算机设备和存储介质
CN108595695B (zh) 数据处理方法、装置、计算机设备和存储介质
US11392775B2 (en) Semantic recognition method, electronic device, and computer-readable storage medium
CN112215008B (zh) 基于语义理解的实体识别方法、装置、计算机设备和介质
WO2021120627A1 (zh) 数据搜索匹配方法、装置、计算机设备和存储介质
CN110929125B (zh) 搜索召回方法、装置、设备及其存储介质
CN111797214A (zh) 基于faq数据库的问题筛选方法、装置、计算机设备及介质
WO2020114100A1 (zh) 一种信息处理方法、装置和计算机存储介质
WO2020206910A1 (zh) 产品信息推送方法、装置、计算机设备和存储介质
WO2021159670A1 (zh) 智能问答中未知问题处理方法、装置、计算机设备和介质
WO2021196934A1 (zh) 一种基于字段相似度计算的问题推荐方法、装置和服务器
WO2021063089A1 (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
CN112926308B (zh) 匹配正文的方法、装置、设备、存储介质以及程序产品
CN113343108B (zh) 推荐信息处理方法、装置、设备及存储介质
CN105791446A (zh) 一种民间借贷处理方法、装置及系统
CN112287069A (zh) 基于语音语义的信息检索方法、装置及计算机设备
CN112395391A (zh) 概念图谱构建方法、装置、计算机设备及存储介质
CN114399396A (zh) 保险产品推荐方法、装置、计算机设备及存储介质
CN115438149A (zh) 一种端到端模型训练方法、装置、计算机设备及存储介质
CN109086386B (zh) 数据处理方法、装置、计算机设备和存储介质
WO2020057023A1 (zh) 自然语言的语义解析方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19873178

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/08/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19873178

Country of ref document: EP

Kind code of ref document: A1