CN110688838B

CN110688838B - Idiom synonym list generation method and device

Info

Publication number: CN110688838B
Application number: CN201910950701.4A
Authority: CN
Inventors: 刘晓楠; 李长亮; 汪美玲; 郭昱
Original assignee: Beijing Kingsoft Digital Entertainment Co Ltd; Chengdu Kingsoft Digital Entertainment Co Ltd
Current assignee: Beijing Kingsoft Digital Entertainment Co Ltd; Chengdu Kingsoft Digital Entertainment Co Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2023-07-18
Anticipated expiration: 2039-10-08
Also published as: CN110688838A

Abstract

The application provides a method and a device for generating idiom synonym list, wherein the method comprises the following steps: acquiring a problem statement input by a user, and identifying a target idiom from the problem statement input by the user; acquiring at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generating an idiom recommendation list corresponding to the at least one candidate idiom; carrying out similarity calculation on word embedding vectors corresponding to the target idioms and word embedding vectors corresponding to each candidate idiom in the idiom recommendation list respectively to obtain similarity values corresponding to each candidate idiom and the target idiom; and screening the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain the idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

Description

Idiom synonym list generation method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a computing device, and a computer readable storage medium for generating a idiom synonym list.

Background

The existing network idiom dictionary mainly provides information such as idiom pronunciation, paraphrasing, provenance, paraphrasing, anticonsite and the like, and generally adopts relational database organization and storage, and the use mode of related synonyms which can be provided for users on the basis is as follows: firstly searching a specific idiom, further checking related information of the idiom, and then comparing the related idiom with interpretation of the specific idiom by a user by opening a paraphrasing link provided by the returned information to judge whether the related idiom and the interpretation of the specific idiom are synonyms. Meanwhile, the application of the current Chinese synonym technology is mainly in the fields of information retrieval, foreign Chinese teaching, professional vocabulary and the like, and the related idiom field is mostly manually marked paranym relations containing partial synonym relations.

Under the condition that a general user needs to search synonyms of specific idioms in writing, the tools such as searching or dictionary of a third party are required to be switched, however, at present, the tools mainly support to return related idiom information aiming at the input idioms, only can provide the synonym links with similar meaning about the idioms, but do not provide synonym information with the same meaning as the idioms, the user is required to open the links with partial synonyms in the idiom information, the paraphrasing of the original idioms and the paraphrasing is compared, and whether the idioms are synonym relations is judged, so that the user is required to conduct more screening and screening on the idioms returned by the tools, the continuity of the user about document writing thought is greatly damaged, the difficulty of the user in acquiring the required information is improved, and the accuracy of the user in acquiring the required information is reduced.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method, an apparatus, a computing device, and a computer-readable storage medium for generating a idiom synonym list, so as to solve the technical drawbacks in the prior art.

According to a first aspect of embodiments of the present disclosure, there is provided a method for generating a idiom synonym list, including:

acquiring a problem statement input by a user, and identifying a target idiom from the problem statement input by the user;

acquiring at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generating an idiom recommendation list corresponding to the at least one candidate idiom;

carrying out similarity calculation on word embedding vectors corresponding to the target idioms and word embedding vectors corresponding to each candidate idiom in the idiom recommendation list respectively to obtain similarity values corresponding to each candidate idiom and the target idiom;

screening the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom to obtain an idiom recommendation list only containing candidate idioms which are synonymous with the target idiom;

According to a second aspect of embodiments of the present specification, there is provided a generating device of a idiom synonym list, including:

the idiom recognition module is configured to acquire a problem statement input by a user, and recognize a target idiom from the problem statement input by the user;

the list generation module is configured to acquire at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph and generate an idiom recommendation list corresponding to the at least one candidate idiom;

the similarity calculation module is configured to calculate the similarity between the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom in the idiom recommendation list respectively, so as to obtain a similarity value corresponding to each candidate idiom and the target idiom;

the list screening module is configured to screen the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain an idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom;

according to a third aspect of embodiments of the present specification, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method of generating a list of idioms synonyms when executing the instructions.

According to a fourth aspect of embodiments of the present description, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of a method of generating a list of idioms synonyms.

Aiming at the pain points of the user, which are difficult to distinguish for the subtle differences between the synonyms and the paraphrasing in the writing process, the feature labels in the idiom knowledge graph are utilized to ensure that the generalization of idiom elements and the calculation of the similarity between idiom embedded vectors provide accurate synonyms for the user, and idiom recommendation lists formed by idioms which are mutually replaced with target idioms can be returned under any condition, so that the user can directly ask questions in a writing tool without switching to a third party tool, and the user does not need to identify synonyms or paraphrasing by himself or judge the feasibility of mutual replacement between idioms for the generated idiom recommendation lists, thereby shortening the route for selecting idioms and ensuring the accuracy of idiom selection.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for generating a idiom synonym list provided by an embodiment of the present application;

FIG. 3 is another flowchart of a method for generating a idiom synonym list provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for generating a idiom synonym list according to an embodiment of the present disclosure;

FIG. 5 is another flowchart of a method for generating a idiom synonym list provided by an embodiment of the present application;

FIG. 6 is another flowchart of a method for generating a idiom synonym list provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a generating device of a idiom synonym list provided in an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present invention will be explained.

Knowledge graph: knowledge graph aims at describing various entities or concepts and relations thereof existing in the real world, and forms a huge semantic network graph, wherein nodes represent the entities or concepts, and edges are formed by attributes or relations.

Entity: an entity refers to something that is distinguishable and exists independently, such as a person name, a city name, a plant name, a commodity name, etc., the entity is the most basic element in a knowledge graph, and different relationships exist between different entities.

Attributes: from an entity to its attribute value, different attribute types correspond to edges of different types of attributes, attributes mainly refer to characteristic information of an object, such as "area", "population", "capitalization" are several different attributes, and attribute values mainly refer to values of attributes, such as 960 ten thousand square kilometers, etc.

Relationship: on the knowledge graph, the relationship is a function of mapping several graph nodes (entity, semantic class, attribute value) to boolean values.

Triplet: the triplet is a general expression mode of the knowledge graph, and the basic form of the triplet mainly comprises (head entity-relation-tail entity) and (entity-attribute value) and the like.

Pattern matching algorithm: pattern matching is a basic operation of a character string in a data structure, a substring is given, all substrings identical to the substring are required to be found in a certain character string, supposing that P is the given substring, T is the character string to be found, all substrings identical to P are required to be found from T, the problem becomes pattern matching problem, P is called pattern, T is called target, if one or more substrings with the pattern of P exist in T, the position of the substring in T is given, and the matching is called successful; otherwise, the matching fails. There are many pattern matching algorithms, of which more well-known algorithms are: KMP algorithm, BM algorithm, sunday algorithm and Horspool algorithm.

Morpheme: morphemes are the smallest phonetic, semantic structures, and the smallest meaningful units of language. Morphemes are not independently applied units of language, and their primary function is as the material from which the words are constructed. The method is a speech and semantic combination and has a meaningful language unit, the purpose is to distinguish the syllables from each other, and some syllables are light with or without meaning, and cannot be regarded as morphemes, such as and wontons. The method is the smallest meaningful language unit, and does not belong to the language units which are independently used, so that the method is distinguished from words.

In the present application, a method, an apparatus, a computing device, and a computer-readable storage medium for generating a idiom synonym list are provided, and are described in detail in the following embodiments.

Fig. 1 shows a block diagram of a computing device 100 according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. Processor 120 is coupled to memory 110 via bus 130 and database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 140 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100, as well as other components not shown in FIG. 1, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 1 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the method shown in fig. 2. Fig. 2 is a schematic flow chart illustrating a method of generating a idiom synonym list according to an embodiment of the present application, including steps 201-210.

Step 202: and acquiring a problem statement input by a user, and identifying a target idiom from the problem statement input by the user.

In one or more embodiments of the present application, in a case where a user needs to find a synonym of a specific target idiom when inputting text through a terminal device, the user may directly ask a question to a system in a authoring tool, the system may obtain a question sentence input by the user, and identify the target idiom that the user wants to find the synonym from the question sentence input by the user, for example, in a case where the user needs to find a synonym idiom of a target idiom "dark place" to replace, the user may input a synonym of a question sentence "dark place" to ask the system, and the system may obtain the synonym of a question sentence "dark place" and identify the target idiom "dark place" from the synonym of a question sentence "dark place".

Step 204: and acquiring at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generating an idiom recommendation list corresponding to the at least one candidate idiom.

In one or more embodiments of the present application, a idiom knowledge graph is constructed by a system using a idiom knowledge graph construction method based on feature labels, after the system acquires the target idiom of a user question, the system matches at least one candidate idiom having an identical feature label with the target idiom from the idiom knowledge graph through the feature labels already marked in the idiom knowledge graph, and generates a idiom recommendation list corresponding to the at least one candidate idiom, so as to ensure that the target idiom is identical to a main idiom of the candidate idiom, and the synonym of the target idiom is distinguished from a synonym by the connection between feature labels, for example, for a target idiom "darkness bin" in a problem statement input by the user, the synonym has a synonym such as "bright trim" or "dark fall wave", also has a synonym such as "surreptitious day" or "moving flower" and also has a synonym such as "open-eye" and "opposite-eye" so as to ensure that the target idiom is identical to the feature label, but has identical features or different synonyms.

Step 206: and carrying out similarity calculation on the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom in the idiom recommendation list respectively to obtain a similarity value corresponding to each candidate idiom and the target idiom.

In one or more embodiments of the present application, the system uses word embedding vectors to calculate the similarity between a target idiom and each of the candidate idioms in the idiom recommendation list, thereby measuring the similarity between each of the candidate idioms and the target idiom.

Step 208: and screening the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain the idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

In one or more embodiments of the present application, the system filters candidate idioms in the idiom recommendation list according to the similarity degree between each candidate idiom and the target idiom, and removes suspected paraphrasing words with the similarity degree not meeting the requirement from the idiom recommendation list, so as to obtain an idiom recommendation list only including candidate idioms with the target idiom as synonyms.

In the above embodiment, after obtaining the idiom recommendation list including only the candidate idioms that are synonymous with the target idiom, further includes:

and returning the idiom recommendation list containing the candidate idioms which are synonymous with the target idiom to the user.

In one or more embodiments of the present application, after generating a idiom recommendation list that includes only candidate idioms that are synonyms for the target idiom, the system returns the idiom recommendation list to the user so that the user can obtain candidate idiom information that is synonyms for the target idiom.

In the above embodiment, as shown in fig. 3, before acquiring the question sentence input by the user, steps 302 to 306 are further included:

step 302: the method comprises the steps of obtaining structured data from a preset corpus database, wherein the structured data comprises a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relation information among the idiom entities and tag relation information among the idiom entities and the feature tags.

In one or more embodiments of the present application, the system may obtain structured data from an existing corpus database, such as a network encyclopedia, network dictionary, or specialized database, the structured data including a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relationship information between the plurality of idiom entities, and tag relationship information between the idiom entities and the feature tags, wherein the semantic relationship information includes synonym relationships, hyponym relationships, and anti-hypernym relationships, among others.

Step 304: and constructing a idiom knowledge graph according to the structured data, so that the idiom knowledge graph contains idiom entities with semantic relations, and attributes and at least one feature label corresponding to each idiom entity.

In one or more embodiments of the present application, as shown in fig. 4, there is a synonym relationship, a paraphrasing relationship, and an anti-paraphrasing relationship in the idiom knowledge graph constructed, and it is assumed that idiom entity a, idiom entity B, idiom entity C, and idiom entity D are idiom entities in the idiom knowledge graph, and that idiom entity a and idiom entity B are synonym relationships, idiom entity a and idiom entity C are paraphrasing relationships, and idiom entity a and idiom entity D are anti-paraphrasing relationships, and then idiom entity a and idiom entity B should have identical feature labels, such as "darkness bin" and "bright-dark", and because the meaning is similar between the two paraphrasing words, but the adjective fields are often different, so that idiom entity a and idiom entity C have at least one identical feature label, such as "darkness bin" and "transfer wood.

Step 306: and acquiring word embedding vectors corresponding to each idiom entity in the idiom knowledge graph from a preset Chinese character word and sentence embedding corpus.

In one or more embodiments of the present application, word embedding vectors corresponding to chinese words and phrase entities including idioms trained in advance by a model are already stored in an existing chinese word and sentence embedding corpus, and the system can load word embedding vectors corresponding to all limited idiom entities in the idiom knowledge graph for subsequent similarity calculation.

According to the method, the idiom knowledge graph is constructed through the structured data, the synonyms and the paraphrasing are distinguished based on the feature labels, and the user is supported to acquire idiom information from multiple sides.

Fig. 5 shows a method for generating a idiom synonym list according to an embodiment of the present disclosure, wherein the generating method of the idiom synonym list is described by taking generation of the idiom synonym list as an example, and the method includes steps 502 to 516.

Step 502: the method comprises the steps of obtaining structured data from a preset corpus database, wherein the structured data comprises a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relation information among the idiom entities and tag relation information among the idiom entities and the feature tags.

Step 504: and constructing a idiom knowledge graph according to the structured data, so that the idiom knowledge graph contains idiom entities with semantic relations, and attributes and at least one feature label corresponding to each idiom entity.

In one or more embodiments of the present application, as shown in fig. 4, there is a synonym relationship, a paranym relationship, and an anti-synonym relationship in the constructed idiom knowledge graph, and if idiom entity a, idiom entity B, idiom entity C, and idiom entity D are idiom entities in the idiom knowledge graph, and idiom entity a and idiom entity B are synonym relationships, idiom entity a and idiom entity C are paranym relationships, and idiom entity a and idiom entity D are anti-synonym relationships, then the idiom entity a and idiom entity B should have identical feature labels.

Step 506: and acquiring a problem statement input by a user, performing Chinese word segmentation on the problem statement, and acquiring text data corresponding to a target idiom in the problem statement.

In one or more embodiments of the present application, after obtaining a question sentence input by a user, the system firstly performs word segmentation on the question sentence through a chinese word segmentation technology of natural language processing, thereby extracting the target idiom from the question sentence, and obtaining a substring corresponding to the target idiom, i.e. text data.

Step 508: and acquiring idiom entities matched with the text data corresponding to the target idiom from the corpus database based on the text data corresponding to the target idiom and a pattern matching algorithm so as to identify the target idiom.

In one or more embodiments of the present application, the system performs matching with a substring corresponding to the target idiom as a mode (keyword) based on a pattern matching algorithm, and with the corpus database as a target, and searches the corpus database for the target idiom so as to identify the target idiom.

Step 510: and determining at least one characteristic label corresponding to the target idiom in the idiom knowledge graph.

In one or more embodiments of the present application, after determining the target idiom, the system further determines at least one feature tag corresponding to the target idiom through an idiom knowledge graph, where the feature tag is already marked and manually audited, and is used for marking attribute or description information of the target idiom, for example, meaning of the idiom "darkness ageing bin" is "confused enemy from front, used for covering an attack route of the person, and makes a sudden attack from a flank, which is a surprise of clashing the person and wonder, and by extension, means to confuse the person with obvious actions, make an unprecedented strategy, and also say to perform activities in a hidden manner. ", its signature may include" military "," purported "and" surreptitious ", etc.

Step 512: and acquiring at least one idiom entity with the identical feature tag with the target idiom from the idiom knowledge graph as a candidate idiom based on at least one feature tag corresponding to the target idiom, and generating an idiom recommendation list corresponding to the at least one candidate idiom.

In one or more embodiments of the present application, since a plurality of feature labels are set for each idiom entity in the idiom knowledge graph, the system only needs to match the feature labels in the idiom knowledge graph, so as to obtain at least one idiom entity having the same feature label as the target idiom as a candidate idiom, and generate an idiom recommendation list corresponding to the at least one candidate idiom, so that the selected candidate idiom and the main morpheme of the target idiom can be ensured to be the same.

Step 514: and carrying out similarity calculation on the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom in the idiom recommendation list respectively to obtain a similarity value corresponding to each candidate idiom and the target idiom.

Step 516: and screening the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain the idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

By utilizing the connection between the idioms and the corresponding feature labels, the method and the device distinguish the synonyms from the synonyms, so that the synonyms required by the user are distinguished, the mixed synonyms are filtered, and the candidate idioms in the idiom recommendation list can be interchanged with the target idioms under any context.

Fig. 6 shows a method for generating a idiom synonym list according to an embodiment of the present disclosure, wherein the generating method of the idiom synonym list is described by taking generation of the idiom synonym list as an example, and the method includes steps 602 to 620.

Step 602: the method comprises the steps of obtaining structured data from a preset corpus database, wherein the structured data comprises a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relation information among the idiom entities and tag relation information among the idiom entities and the feature tags.

Step 604: and constructing a idiom knowledge graph according to the structured data, so that the idiom knowledge graph contains idiom entities with semantic relations, and attributes and at least one feature label corresponding to each idiom entity.

Step 606: and acquiring word embedding vectors corresponding to each idiom entity in the idiom knowledge graph from a preset Chinese character word and sentence embedding corpus.

Step 608: and acquiring a problem statement input by a user, and identifying a target idiom from the problem statement input by the user.

Step 610: and acquiring at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generating an idiom recommendation list corresponding to the at least one candidate idiom.

In one or more embodiments of the present application, a idiom knowledge graph is constructed by a system using a idiom knowledge graph construction method based on feature labels, after the system acquires the target idiom of a user question, the system matches at least one candidate idiom having the same feature label as the target idiom in the idiom knowledge graph through the feature label already marked in the idiom knowledge graph by the target idiom, and generates an idiom recommendation list corresponding to the at least one candidate idiom.

Step 612: and determining word embedding vectors corresponding to the target idioms and word embedding vectors corresponding to each candidate idiom in the idiom recommendation list based on the Chinese character word and sentence embedding corpus.

In one or more embodiments of the present application, the system determines, after loading, a word embedding vector corresponding to the target idiom and a word embedding vector corresponding to each candidate idiom in the idiom recommendation list from word embedding vectors corresponding to all limited idiom entities in the idiom knowledge graph.

Step 614: and based on a similarity algorithm, respectively calculating cosine similarity between the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom.

In one or more embodiments of the present application, the system calculates, based on a similarity algorithm, cosine similarity of a word embedding vector corresponding to the target idiom and a word embedding vector corresponding to each candidate idiom, where cosine similarity uses a cosine value of an included angle of two vectors in a vector space as a measure of a difference between two individuals, the cosine similarity is still kept as "1 when the two individuals are identical" in a high-dimensional case, is 0 when the two individuals are orthogonal, and is "1" when the two individuals are orthogonal, and compared with a distance measure, the cosine similarity is more focused on a difference of two vectors in a direction, rather than a distance or a length, and the formula is as follows:

in particular, since the cosine similarity measures the included angle of the space vector, and the difference in the direction is more than the position, there is also a case that the cosine similarity is high but the two idiom entities are anti-ambiguities, so that the feature labels are needed to ensure that the main morphemes between the candidate idioms and the target idioms are the same.

Step 616: and comparing the cosine similarity of the word embedding vector corresponding to the target idiom with the cosine similarity of the word embedding vector corresponding to each candidate idiom with a similarity threshold, and judging whether the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is larger than or equal to the similarity threshold. If yes, go to step 618, if not, go to step 620.

Step 618: and reserving the candidate idioms in the idiom recommendation list.

In one or more embodiments of the present application, when the cosine similarity between the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is greater than or equal to the similarity threshold, the similarity between the target idiom and the candidate idiom is considered to be higher, and the target idiom and the candidate idiom may be determined to be synonyms, so that the candidate idiom is reserved in the idiom recommendation list.

Step 620: and removing the candidate idioms from the idiom recommendation list.

In one or more embodiments of the present application, in a case where cosine similarity between the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is smaller than the similarity threshold, it is considered that the similarity between the target idiom and the candidate idiom is weaker, and is insufficient to determine that the target idiom and the candidate idiom are synonyms, so that the candidate idiom is removed from the idiom recommendation list.

Alternatively, the similarity threshold may be 0.9.

According to the method, cosine similarity is calculated on the target idiom and each candidate idiom by using the word embedding vector, and the cosine value of the included angle of the two vectors in the vector space is used as the size for measuring the difference between two idiom entities, so that whether the two idioms are synonyms or not can be accurately and reliably judged.

Corresponding to the method embodiment, the present disclosure further provides an embodiment of a generating device for a idiom synonym list, and fig. 7 shows a schematic structural diagram of the generating device for the idiom synonym list in one embodiment of the present disclosure. As shown in fig. 7, the apparatus includes:

a idiom recognition module 701 configured to obtain a question sentence input by a user, and recognize a target idiom from the question sentence input by the user;

the list generation module 702 is configured to acquire at least one candidate idiom with the same feature tag as the target idiom from a preset idiom knowledge graph, and generate an idiom recommendation list corresponding to the at least one candidate idiom;

the similarity calculation module 703 is configured to perform similarity calculation on the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom in the idiom recommendation list, so as to obtain a similarity value corresponding to each candidate idiom and the target idiom;

And a list filtering module 704, configured to filter the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain an idiom recommendation list only including candidate idioms which are synonymous with the target idiom.

Optionally, the apparatus further includes:

and the list returning module is configured to return the idiom recommendation list containing candidate idioms which are synonymous with the target idiom to a user.

Optionally, the apparatus further includes:

the data acquisition module is configured to acquire structured data from a preset corpus database, wherein the structured data comprises a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relation information among the idiom entities and tag relation information among the idiom entities and the feature tags;

the map construction module is configured to construct idiom knowledge maps according to the structured data, so that the idiom knowledge maps contain idiom entities with semantic relations, attributes corresponding to each idiom entity and at least one feature tag.

Optionally, the apparatus further includes:

The word vector loading module is configured to acquire word embedding vectors corresponding to each idiom entity in the idiom knowledge graph from a preset Chinese character word and sentence embedding corpus.

Optionally, the idiom recognition module includes:

the word segmentation unit is configured to acquire a problem sentence input by a user, perform Chinese word segmentation on the problem sentence and acquire text data corresponding to a target idiom in the problem sentence;

and the keyword searching unit is configured to acquire idiom entities matched with the text data corresponding to the target idiom from the corpus database based on the text data corresponding to the target idiom and a pattern matching algorithm so as to identify the target idiom.

Optionally, the list generating module includes:

a tag determining unit configured to determine at least one feature tag corresponding to the target idiom in the idiom knowledge graph;

and the label matching unit is configured to acquire at least one idiom entity with the identical characteristic label with the target idiom from the idiom knowledge graph as a candidate idiom based on the at least one characteristic label corresponding to the target idiom.

Optionally, the similarity calculation module includes:

A word vector determining unit configured to determine a word embedding vector corresponding to the target idiom and a word embedding vector corresponding to each of the candidate idioms in the idiom recommendation list based on the kanji word and sentence embedding corpus;

and the cosine similarity calculation unit is configured to calculate the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom respectively based on a similarity algorithm.

Optionally, the list screening module includes:

the threshold value comparison unit is configured to compare the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom with a similarity threshold value and judge whether the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is larger than or equal to the similarity threshold value; if yes, executing the retaining unit, and if not, executing the removing unit;

a retaining unit configured to retain the candidate idioms in the idiom recommendation list;

and the removing unit is configured to remove the candidate idioms from the idiom recommendation list.

Optionally, the similarity threshold is 0.9.

An embodiment of the present application also provides a computing device including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the following steps when executing the instructions:

and screening the candidate idioms in the idiom recommendation list according to the similarity value corresponding to each candidate idiom and the target idiom, so as to obtain the idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a method of generating a idiom synonym list as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the computer readable storage medium and the technical solution of the generating method of the idiom synonym list belong to the same conception, and the details of the technical solution of the computer readable storage medium which are not described in detail can be referred to the description of the technical solution of the generating method of the idiom synonym list.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. The method for generating the idiom synonym list is characterized by comprising the following steps of:

acquiring at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generating an idiom recommendation list corresponding to the at least one candidate idiom, wherein the same characteristic label represents that the characteristic label of the target idiom is the same as the characteristic label of the candidate idiom, and the characteristic label is used for marking the attribute or description information of the target idiom;

and screening the candidate idioms in the idiom recommendation list according to the similarity value and the similarity threshold value corresponding to each candidate idiom and the target idiom, so as to obtain the idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

2. The method of claim 1, further comprising, after obtaining a idiom recommendation list comprising only candidate idioms that are synonyms for the target idiom:

3. The method of claim 1, further comprising, prior to obtaining the question statement entered by the user:

obtaining structured data from a preset corpus database, wherein the structured data comprises a plurality of idiom entities, a plurality of feature tags, idiom attribute information, semantic relation information among the idiom entities and tag relation information among the idiom entities and the feature tags;

and constructing a idiom knowledge graph according to the structured data, so that the idiom knowledge graph contains idiom entities with semantic relations, and attributes and at least one feature label corresponding to each idiom entity.

4. A method according to claim 3, further comprising, after constructing a idiom knowledge-graph from the structured data:

and acquiring word embedding vectors corresponding to each idiom entity in the idiom knowledge graph from a preset Chinese character word and sentence embedding corpus.

5. The method of claim 3, wherein the obtaining the user-entered question sentence, and wherein identifying the target idiom from the user-entered question sentence comprises:

acquiring a problem statement input by a user, and performing Chinese word segmentation on the problem statement to acquire text data corresponding to a target idiom in the problem statement;

and acquiring idiom entities matched with the text data corresponding to the target idiom from the corpus database based on the text data corresponding to the target idiom and a pattern matching algorithm so as to identify the target idiom.

6. The method of claim 5, wherein the obtaining at least one candidate idiom having the same feature tag as the target idiom in a preset idiom knowledge graph comprises:

determining at least one feature tag corresponding to the target idiom in the idiom knowledge graph;

and acquiring at least one idiom entity with the identical feature label with the target idiom from the idiom knowledge graph as a candidate idiom based on the at least one feature label corresponding to the target idiom.

7. The method of claim 4, wherein the performing similarity calculation on the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom in the idiom recommendation list, respectively, to obtain a similarity degree value corresponding to each candidate idiom and the target idiom includes:

Determining word embedding vectors corresponding to the target idioms and word embedding vectors corresponding to each candidate idiom in the idiom recommendation list based on the Chinese character word and sentence embedding corpus;

and based on a similarity algorithm, respectively calculating cosine similarity between the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to each candidate idiom.

8. The method of claim 7, wherein the filtering candidate idioms in the idiom recommendation list according to a similarity value and a similarity threshold for each of the candidate idioms corresponding to the target idiom comprises:

comparing the cosine similarity of the word embedding vector corresponding to the target idiom with the cosine similarity of the word embedding vector corresponding to each candidate idiom with a similarity threshold value, and judging whether the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is larger than or equal to the similarity threshold value;

when cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is larger than or equal to the similarity threshold value, reserving the candidate idiom in the idiom recommendation list;

And removing the candidate idioms from the idiom recommendation list under the condition that the cosine similarity of the word embedding vector corresponding to the target idiom and the word embedding vector corresponding to the candidate idiom is smaller than the similarity threshold value.

9. The method of claim 1, wherein the similarity threshold is 0.9.

10. A idiom synonym list generation device, comprising:

the list generation module is configured to acquire at least one candidate idiom with the same characteristic label as the target idiom from a preset idiom knowledge graph, and generate an idiom recommendation list corresponding to the at least one candidate idiom, wherein the same characteristic label represents that the characteristic label of the target idiom is the same as the characteristic label of the candidate idiom, and the characteristic label is used for marking the attribute or description information of the target idiom;

And the list screening module is configured to screen the candidate idioms in the idiom recommendation list according to the similarity value and the similarity threshold value corresponding to each candidate idiom and the target idiom, so as to obtain an idiom recommendation list only comprising candidate idioms which are synonymous with the target idiom.

11. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the method of any of claims 1-9.

12. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 9.