WO2021174783A1 - Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support - Google Patents

Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support Download PDF

Info

Publication number
WO2021174783A1
WO2021174783A1 PCT/CN2020/111915 CN2020111915W WO2021174783A1 WO 2021174783 A1 WO2021174783 A1 WO 2021174783A1 CN 2020111915 W CN2020111915 W CN 2020111915W WO 2021174783 A1 WO2021174783 A1 WO 2021174783A1
Authority
WO
WIPO (PCT)
Prior art keywords
word vector
word
target
preset number
synonyms
Prior art date
Application number
PCT/CN2020/111915
Other languages
English (en)
Chinese (zh)
Inventor
陈林
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174783A1 publication Critical patent/WO2021174783A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for pushing synonyms.
  • the project requirement is the artificial intelligence (AI) interview rule configuration system.
  • AI artificial intelligence
  • Users of some companies can update the answer keywords in the expert rules in real time.
  • the inventor realizes that the user needs to input a large amount of information manually and purely when filling in answer keywords, and the system cannot provide assistance to the user when inputting keywords, such as recommendation of synonyms. This operation reduces the user's writing efficiency, and it is extremely dependent on the user's personal understanding of the answer keywords, and cannot guarantee whether the keywords input by the user are relatively complete and objective.
  • the first aspect of the present application provides a method for pushing synonyms, and the method includes:
  • a second preset number of synonyms are pushed for the user to select.
  • the second aspect of the present application is a device for pushing synonyms, and the device includes:
  • Acquisition module used to acquire interview questions
  • the configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions
  • the training module is used to pre-train the target word vector model based on the super-large word vector model
  • the construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;
  • the construction module is also used to construct a binary tree based on all word vectors in the target word vector model
  • a traversal module configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
  • a deduplication module configured to deduplicate the first candidate word vector in the priority queue
  • the acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue.
  • the push module is configured to push the second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • a third aspect of the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
  • a second preset number of synonyms are pushed for the user to select.
  • a fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:
  • a second preset number of synonyms are pushed for the user to select.
  • the method, device, electronic equipment and storage medium for pushing synonyms described in this application By configuring the first preset number of keywords corresponding to the answers to the interview questions, search for the second preset number of synonyms corresponding to each keyword in the pre-trained word vector model, and push the second preset number Set a number of synonyms for users to choose. You can configure more synonyms of keywords of the answers corresponding to the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
  • FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application.
  • Fig. 2 is a functional module diagram of the push device provided in the second embodiment of the present application.
  • Fig. 3 is a schematic diagram of an electronic device provided in a third embodiment of the present application.
  • the method for pushing synonymous words in the embodiment of this application is applied in an electronic device.
  • the synonym push function provided by the method of this application can be directly integrated on the electronic device, or the client for implementing the method of this application can be installed.
  • the method provided in this application can also be run on a server and other devices in the form of a Software Development Kit (SDK), providing an interface for the push function of synonyms in the form of SDK, and electronic devices or other devices provide The interface can realize the push function of synonyms.
  • SDK Software Development Kit
  • FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application. According to different requirements, the execution sequence in the flowchart can be changed, and some steps can be omitted.
  • the robot can better determine whether the job applicant is correct in answering the interview questions in the interview process, and when the job applicant is graded according to the answer result. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result.
  • this application provides a way to expand the keywords input by the interviewer when configuring keywords, and push the same. /Method of meaning words.
  • the method includes:
  • Step S1 Obtain interview questions.
  • interview questions will be configured according to different positions.
  • interview questions configured according to R&D positions include "Which programming languages are you familiar with”, "How to break out of the current multiple nested loops in Java” and "Is there a memory leak in Java, please describe briefly” and so on.
  • the robot interview needs to be pre-configured with interview questions and answers.
  • different job applicants give different answers when facing the same interview questions.
  • Step S2 configuring the first preset number of keywords of answers corresponding to the interview questions.
  • the step of configuring the first preset number of keywords for answers corresponding to the interview questions includes:
  • the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
  • the step of configuring the first preset number of keywords of answers corresponding to the interview questions includes:
  • the topic analysis model can analyze the topic characteristics of the interview topic.
  • the topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
  • the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
  • Step S3 pre-training based on the super large word vector model to obtain the target word vector model.
  • pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
  • the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia.
  • the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded.
  • the target word vector model is a word vector model that contains the prediction of the robot interview.
  • the final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
  • Step S4 constructing a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index.
  • the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
  • the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns.
  • the dimension of each word is 200
  • the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
  • each row in the word vector matrix has an index
  • the index corresponding to each word can be obtained.
  • the word-index file is output according to the word vector matrix.
  • the corresponding relationship between each index and each word vector can also be obtained.
  • step S5 a binary tree is constructed based on all word vectors in the target word vector model.
  • a binary tree structure is constructed according to all word vectors in the target word vector model.
  • the word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space.
  • the data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
  • the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
  • the subspace is no longer divided.
  • the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
  • each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. That is, the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector.
  • the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector.
  • there is no need to save the word vector on the leaf node only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.
  • Step S6 Traverse the binary tree, query the binary tree to find a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
  • the specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than
  • the intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
  • Step S7 De-duplicate the first candidate word vector in the priority queue.
  • Step S8 Obtain the target word vectors of the second preset number in the prioritized queue after deduplication.
  • Step S9 Push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
  • the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
  • the synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
  • a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
  • the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type.
  • Chinese return to English; or input English and return to Chinese, return normally.
  • numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
  • the synonym push method provided in this application includes obtaining interview questions; configuring the first preset number of keywords corresponding to the answers to the interview questions; searching for each key word in the pre-trained word vector model The second preset number of synonyms corresponding to the word; and pushing the second preset number of synonyms for the user to choose.
  • the word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions.
  • the vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem.
  • the word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability.
  • this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
  • Fig. 2 is a diagram of functional modules in a preferred embodiment of a device for pushing synonyms of this application.
  • the synonym pushing device 20 (referred to as “pushing device” for ease of description) runs in an electronic device.
  • the pushing device 20 may include multiple functional modules composed of program code segments.
  • the program code of each program segment in the pushing device 20 can be stored in a memory and executed by at least one processor to perform the function of pushing synonyms.
  • the functional modules of the pushing device 20 may include: an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a pushing module 207.
  • the function of each module will be detailed in the subsequent embodiments.
  • the module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.
  • the obtaining module 201 is used to obtain interview questions.
  • interview questions will be configured according to different positions.
  • interview questions configured according to R&D positions include "Which programming languages are you familiar with”, "How to break out of the current multiple nested loops in Java” and "Is there a memory leak in Java, please describe briefly” and so on.
  • the robot interview needs to be pre-configured with interview questions and answers.
  • different job applicants give different answers when facing the same interview questions.
  • the configuration module 202 is configured to configure a first preset number of keywords of answers corresponding to the interview questions.
  • the keywords for configuring the first preset number of answers corresponding to the interview questions include:
  • the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
  • the keywords for configuring the first preset number of answers corresponding to the interview questions include:
  • the topic analysis model can analyze the topic characteristics of the interview topic.
  • the topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
  • the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
  • the training module 203 is used for pre-training to obtain the target word vector model based on the super large word vector model.
  • pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
  • the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia.
  • the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded.
  • the target word vector model is a word vector model that contains the prediction of the robot interview.
  • the final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index.
  • the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
  • the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns.
  • the dimension of each word is 200
  • the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
  • each row in the word vector matrix has an index
  • the index corresponding to each word can be obtained.
  • the word-index file is output according to the word vector matrix.
  • the corresponding relationship between each index and each word vector can also be obtained.
  • the construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model.
  • a binary tree structure is constructed for all word vectors in the target word vector model.
  • the word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space.
  • the data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
  • Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane.
  • the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
  • the subspace is no longer divided.
  • the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
  • the segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree.
  • the word vector is the leaf node on the binary tree.
  • the traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
  • the specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than
  • the intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
  • the deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue.
  • the acquiring module 201 is also used for acquiring the target word vectors of the second preset number in the prioritized queue after deduplication.
  • the pushing module 207 is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
  • the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
  • the synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
  • a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
  • the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type.
  • Chinese return to English; or input English and return to Chinese, return normally.
  • numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
  • the aforementioned pushing device 20 can also be used to push synonyms.
  • the push device 20 described in this application includes an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a push module 207.
  • the acquisition module 201 is used to obtain interview questions; the configuration module 202 is used to configure a first preset number of keywords corresponding to the answers to the interview questions; the training module 203 is used to pre-set based on the super-large word vector model
  • the target word vector model is obtained through training;
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
  • the construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model;
  • the traversal module 205 is used to traverse the binary tree, and query from the binary tree that the distance to the keyword is greater than A first candidate word vector with a preset distance threshold and construct a priority queue based on the first candidate word vector;
  • the de-duplication module 206 is configured to de-duplicate the first candidate word vector in the priority queue;
  • the acquiring module 201 is also configured to acquire the target word
  • the word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions.
  • the vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem.
  • the word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability.
  • the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s.
  • this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process.
  • the above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application. Part of the method.
  • FIG. 3 is a schematic diagram of the electronic device provided in the third embodiment of the application.
  • the electronic device 3 includes a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and running on the at least one processor 32, at least one communication bus 34 and a database 35.
  • the computer program 33 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the at least one processor 32, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the computer-readable instruction segments are used to describe the execution process of the computer program 33 in the electronic device 3.
  • the electronic device 3 may be a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA) and other devices installed with applications.
  • PDA Personal Digital Assistant
  • the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3.
  • the electronic device 3 may also include input and output devices, network access devices, buses, and so on.
  • the at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the processor 32 may be a microprocessor, or the processor 32 may also be any conventional processor, etc.
  • the processor 32 is the control center of the electronic device 3, and connects the entire electronic device with various interfaces and lines. Various parts of device 3.
  • the memory 31 may be used to store the computer program 33 and/or modules/units.
  • the processor 32 runs or executes the computer programs and/or modules/units stored in the memory 31, and calls the computer programs and/or modules/units stored in the memory 31.
  • the data in 31 realizes various functions of the electronic device 3.
  • the memory 31 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data (such as audio data, etc.) created according to the use of the electronic device 3 and the like are stored.
  • the memory 31 may include a volatile memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card.
  • Flash Card at least one magnetic disk storage device, flash memory device, high-speed random access memory, or other storage device.
  • the memory 31 stores program codes, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions.
  • the various modules acquisition module 201, configuration module 202, training module 203, construction module 204, traversal module 205, deduplication module 206, and push module 207) described in FIG. 2 are programs stored in the memory 31
  • the code is executed by the at least one processor 32, so as to realize the functions of the various modules to achieve the purpose of pushing synonyms.
  • the obtaining module 201 is used to obtain interview questions
  • the configuration module 202 is used to configure the first preset number of keywords corresponding to the answers of the interview questions;
  • the training module 203 is used for pre-training to obtain a target word vector model based on the super-large word vector model
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
  • the construction module 204 is further configured to construct a binary tree based on all word vectors in the target word vector model;
  • the traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
  • the deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue
  • the acquiring module 201 is also used to acquire the target word vectors of the second preset number in the prioritized queue after deduplication;
  • the pushing module 207 is configured to push the second preset number of target word vectors and word-index files to the second preset number of synonyms for selection by the user.
  • the database (Database) 35 is a warehouse built on the electronic device 3 for organizing, storing and managing data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 35 is used to store information such as interview questions.
  • the integrated module/unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer-readable instruction code
  • the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random access memory, etc.
  • the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de présentation de quasi-synonymes, comprenant : l'obtention d'une question d'entretien (S1) ; la configuration d'un premier nombre prédéfini de mots-clés d'une réponse correspondant à la question d'entretien (S2) ; la réalisation d'un pré-apprentissage sur la base d'un modèle de vecteur de mot très large pour obtenir un modèle de vecteur de mot cible (S3) ; la construction d'une matrice de vecteurs de mots selon le modèle de vecteur de mots cible pour obtenir un fichier d'index de mots (S4) ; la construction d'un arbre binaire sur la base de tous les vecteurs de mots (S5) ; la traversée de l'arbre binaire, l'interrogation, à partir de l'arbre binaire, de premiers vecteurs de mots candidats ayant des distances aux mots-clés supérieurs à un seuil de distance prédéfini, et la construction d'une file d'attente prioritaire (S6) ; la réalisation d'une déduplication sur les premiers vecteurs de mots candidats dans la file d'attente prioritaire (S7) ; l'obtention d'un second nombre prédéfini de vecteurs de mots cibles à des positions supérieures dans la file d'attente de priorité dé-dupliquée (S8) ; et la présentation, sur la base des vecteurs de mots cibles et du fichier d'index de mots, d'un second nombre prédéfini de quasi-synonymes pour permettre une sélection par un utilisateur (S9). L'invention concerne également un appareil de présentation de quasi-synonymes, un dispositif électronique et un support d'informations. La présente invention permet d'effectuer une présentation rapide de quasi-synonymes à des utilisateurs.
PCT/CN2020/111915 2020-03-02 2020-08-27 Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support WO2021174783A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010136905.7A CN111460798A (zh) 2020-03-02 2020-03-02 近义词推送方法、装置、电子设备及介质
CN202010136905.7 2020-03-02

Publications (1)

Publication Number Publication Date
WO2021174783A1 true WO2021174783A1 (fr) 2021-09-10

Family

ID=71684962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111915 WO2021174783A1 (fr) 2020-03-02 2020-08-27 Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support

Country Status (2)

Country Link
CN (1) CN111460798A (fr)
WO (1) WO2021174783A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792133A (zh) * 2021-11-11 2021-12-14 北京世纪好未来教育科技有限公司 判题方法、装置、电子设备和介质
CN113806311A (zh) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 基于深度学习的文件分类方法、装置、电子设备及介质
CN114742042A (zh) * 2022-03-22 2022-07-12 杭州未名信科科技有限公司 一种文本去重方法、装置、电子设备及存储介质
CN115168661A (zh) * 2022-08-31 2022-10-11 深圳市一号互联科技有限公司 原生图数据处理方法、装置、设备及存储介质
CN115630613A (zh) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460798A (zh) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 近义词推送方法、装置、电子设备及介质
CN112434188B (zh) * 2020-10-23 2023-09-05 杭州未名信科科技有限公司 一种异构数据库的数据集成方法、装置及存储介质
CN112232065B (zh) * 2020-10-29 2024-05-14 腾讯科技(深圳)有限公司 挖掘同义词的方法及装置
CN112906895B (zh) * 2021-02-09 2022-12-06 柳州智视科技有限公司 一种题目对象仿造的方法
CN113095165A (zh) * 2021-03-23 2021-07-09 北京理工大学深圳研究院 一种用于完善面试表现的模拟面试方法和装置
CN113722452B (zh) * 2021-07-16 2024-01-19 上海通办信息服务有限公司 一种问答系统中基于语义的快速知识命中方法及装置
CN117112736B (zh) * 2023-10-24 2024-01-05 云南瀚文科技有限公司 一种基于语义分析模型的信息检索分析方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593206A (zh) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 基于问答互动平台中答案的搜索方法及装置
WO2018149326A1 (fr) * 2017-02-16 2018-08-23 阿里巴巴集团控股有限公司 Procédé et appareil de réponse à une question en langage naturel et serveur
CN109635094A (zh) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 用于生成答案的方法和装置
CN109947922A (zh) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 一种问答处理方法、装置及问答系统
CN111460798A (zh) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 近义词推送方法、装置、电子设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593206A (zh) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 基于问答互动平台中答案的搜索方法及装置
WO2018149326A1 (fr) * 2017-02-16 2018-08-23 阿里巴巴集团控股有限公司 Procédé et appareil de réponse à une question en langage naturel et serveur
CN109635094A (zh) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 用于生成答案的方法和装置
CN109947922A (zh) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 一种问答处理方法、装置及问答系统
CN111460798A (zh) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 近义词推送方法、装置、电子设备及介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806311A (zh) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 基于深度学习的文件分类方法、装置、电子设备及介质
CN113806311B (zh) * 2021-09-17 2023-08-29 深圳市深可信科学技术有限公司 基于深度学习的文件分类方法、装置、电子设备及介质
CN113792133A (zh) * 2021-11-11 2021-12-14 北京世纪好未来教育科技有限公司 判题方法、装置、电子设备和介质
CN113792133B (zh) * 2021-11-11 2022-04-29 北京世纪好未来教育科技有限公司 判题方法、装置、电子设备和介质
CN114742042A (zh) * 2022-03-22 2022-07-12 杭州未名信科科技有限公司 一种文本去重方法、装置、电子设备及存储介质
CN115168661A (zh) * 2022-08-31 2022-10-11 深圳市一号互联科技有限公司 原生图数据处理方法、装置、设备及存储介质
CN115630613A (zh) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法
CN115630613B (zh) * 2022-12-19 2023-04-07 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法

Also Published As

Publication number Publication date
CN111460798A (zh) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2021174783A1 (fr) Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support
CN109670163B (zh) 信息识别方法、信息推荐方法、模板构建方法及计算设备
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
US9201931B2 (en) Method for obtaining search suggestions from fuzzy score matching and population frequencies
CN117235226A (zh) 一种基于大语言模型的问题应答方法及装置
CN111339277A (zh) 基于机器学习的问答交互方法及装置
CN108875743B (zh) 一种文本识别方法及装置
CN112559709A (zh) 基于知识图谱的问答方法、装置、终端以及存储介质
US20230030086A1 (en) System and method for generating ontologies and retrieving information using the same
CN114547253A (zh) 一种基于知识库应用的语义搜索方法
US11360953B2 (en) Techniques for database entries de-duplication
US10073890B1 (en) Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm
CN113641833A (zh) 服务需求匹配方法及装置
US20170124090A1 (en) Method of discovering and exploring feature knowledge
TW202123026A (zh) 資料歸檔方法、裝置、電腦裝置及存儲介質
CN115982346A (zh) 一种问答库构建方法、终端设备及存储介质
CN113127617A (zh) 通用领域知识图谱的知识问答方法、终端设备及存储介质
CN109684357B (zh) 信息处理方法及装置、存储介质、终端
CN117076636A (zh) 一种智能客服的信息查询方法、系统和设备
US20180113908A1 (en) Transforming and evaluating missing values in graph databases
CN110147358B (zh) 自动问答知识库的建设方法及建设系统
CN115114420A (zh) 一种知识图谱问答方法、终端设备及存储介质
CN112989011B (zh) 数据查询方法、数据查询装置和电子设备
CN113761213B (zh) 一种基于知识图谱的数据查询系统、方法及终端设备
CN113590792A (zh) 用户问题的处理方法、装置和服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923243

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923243

Country of ref document: EP

Kind code of ref document: A1