WO2023078414A1 - Related article search method and apparatus, electronic device, and storage medium - Google Patents

Related article search method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2023078414A1
WO2023078414A1 PCT/CN2022/129963 CN2022129963W WO2023078414A1 WO 2023078414 A1 WO2023078414 A1 WO 2023078414A1 CN 2022129963 W CN2022129963 W CN 2022129963W WO 2023078414 A1 WO2023078414 A1 WO 2023078414A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
name
article
determining
person
Prior art date
Application number
PCT/CN2022/129963
Other languages
French (fr)
Chinese (zh)
Inventor
王超超
王为磊
屠昶旸
张济徽
Original Assignee
智慧芽信息科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 智慧芽信息科技(苏州)有限公司 filed Critical 智慧芽信息科技(苏州)有限公司
Publication of WO2023078414A1 publication Critical patent/WO2023078414A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Abstract

The present disclosure relates to a related article search method and apparatus, an electronic device, and a storage medium. The method comprises: determining a first article, and determining search information according to the attribute information comprised in the first article; according to the search information of the first article, searching to obtain a first preset number of candidate articles so as to match second name information and second address information in each candidate article according to first name information and first address information comprised in the attribute information of the first article; and finally, determining, according to a matching value of each candidate article and the first article, a second article as an article matching the first article. According to embodiments of the present disclosure, the related candidate articles can be searched according to the attribute information in the first article, and the association with the candidate articles is established according to the name and the address therein so as to obtain the second article having high relevancy, so that the accuracy of the related article search results is improved.

Description

相关文章搜索方法、装置、电子设备和存储介质Related article search method, device, electronic device and storage medium 技术领域technical field
本公开涉及计算机技术领域,尤其涉及一种相关文章搜索方法、装置、电子设备和存储介质。The present disclosure relates to the field of computer technology, and in particular to a related article search method, device, electronic equipment and storage medium.
发明背景Background of the invention
在浏览文章时,用户通常会有浏览该文章相关文章的需求。目前基于一篇文章搜索相关文章的方式为直接基于作者名称进行搜索,这种搜索方式在作者名称通过不同方式表示的情况下难以搜索到相关文章,导致搜索结果准确度很低。When browsing an article, users usually have a need to browse related articles of the article. The current way to search for related articles based on an article is to search directly based on the author's name. This search method is difficult to search for related articles when the author's name is expressed in different ways, resulting in low accuracy of search results.
发明内容Contents of the invention
有鉴于此,本公开提出了一种相关文章搜索方法、装置、电子设备和存储介质,旨在提高相关文章搜索结果的准确性。In view of this, the present disclosure proposes a related article search method, device, electronic device and storage medium, aiming at improving the accuracy of related article search results.
根据本公开的第一方面,提供了一种相关文章搜索方法,所述方法包括:According to a first aspect of the present disclosure, a method for searching related articles is provided, the method comprising:
确定包括属性信息的第一文章,所述属性信息中至少包括第一人名信息和第一地址信息;determining the first article including attribute information, the attribute information including at least first person name information and first address information;
根据所述属性信息确定所述第一文章对应的搜索信息;determining search information corresponding to the first article according to the attribute information;
根据所述搜索信息在文章集合中搜索第一预设数量个候选文章,各所述候选文章中包括对应的第二人名信息和第二地址信息;Searching for a first preset number of candidate articles in the article collection according to the search information, each of the candidate articles includes corresponding second name information and second address information;
根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值;determining matching values of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information;
根据对应的匹配值在各所述候选文章中确定第二文章。A second article is determined among each of the candidate articles according to the corresponding matching value.
在一种可能的实现方式中,所述根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值包括:In a possible implementation manner, the first article and each candidate article are determined according to the first name information and first address information and each of the second name information and second address information Matching values for are:
根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度;determining a first similarity between each of the candidate articles and the first article according to the first person name information and each of the second person name information;
根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度;determining a second degree of similarity between each of the candidate articles and the first article according to the first address information and each of the second address information;
根据各所述候选文章对应的第一相似度和第二相似度确定匹配值。A matching value is determined according to the first similarity and the second similarity corresponding to each candidate article.
在一种可能的实现方式中,所述根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度包括:In a possible implementation manner, the determining the first similarity between each of the candidate articles and the first article according to the first person name information and each of the second person name information includes:
确定所述第一人名信息中包括的至少一个第一人名,和各所述第二人名信息中包括的至少一个第二人名;determining at least one first person's name included in the first person's name information, and at least one second person's name included in each of the second person's name information;
对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值;For each of the first person names, determine at least one corresponding synonym information, and a related score corresponding to each of the synonym information;
对于各所述第二人名信息,确定其中包括的各所述第二人名与各所述同义词信息的关系;For each of the second person name information, determine the relationship between each of the second person names included therein and each of the synonym information;
根据与各所述第二人名对应的同义词信息的相关分值的和确定第一相似度。The first similarity is determined according to the sum of correlation scores of synonym information corresponding to each of the second person names.
在一种可能的实现方式中,所述对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值包括:In a possible implementation manner, for each of the first person names, determining the corresponding at least one synonym information, and the relevant score corresponding to each of the synonym information includes:
确定各所述第一人名对应的至少一个同义词信息;determining at least one synonym information corresponding to each of the first person names;
通过对比各所述同义词信息与对应第一人名的差异确定所述同义词信息对应的相关分值。The correlation score corresponding to the synonym information is determined by comparing the difference between each synonym information and the corresponding first person name.
在一种可能的实现方式中,所述确定各所述第一人名对应的至少一个同义词信息包括:In a possible implementation manner, the determining at least one synonym information corresponding to each of the first person names includes:
响应于所述第一人名为英文人名,确定所述英文人名包括的英文姓和英文名;Responsive to the first person being an English name, determining an English surname and an English name included in the English name;
将各所述第一人名对应的英文姓和英文名分别输入预设的多个姓名格式模板,得到对应的同义词信息。The English surname and English name corresponding to each first person's name are respectively input into a plurality of preset name format templates to obtain corresponding synonym information.
在一种可能的实现方式中,所述确定各所述第一人名对应的至少一个同义词信息还包括:In a possible implementation manner, the determining at least one synonym information corresponding to each of the first person names further includes:
响应于所述第一人名为中文人名,对所述中文人名进行文字转换得到至少一个对应的同义词信息;In response to the first person name being a Chinese name, performing text conversion on the Chinese name to obtain at least one corresponding synonym information;
或者,确定所述中文人名包括的中文姓和中文名;Alternatively, determine the Chinese surname and Chinese name included in the Chinese name;
通过拼音转换的方式将所述中文姓和中文名转换为英文姓和英文名。The Chinese surname and Chinese name are converted into English surname and English name by way of pinyin conversion.
在一种可能的实现方式中,所述通过对比各所述同义词信息与对应第一人名的差异确定相关分值包括:In a possible implementation manner, the determining the relevant score by comparing the difference between each synonym information and the corresponding first person name includes:
确定各所述第一人名对应的第一分值;determining the first score corresponding to each of the first names;
确定多种预设的差异分别对应的第二分值;Determining second scores corresponding to multiple preset differences;
响应于所述同义词信息与对应的第一人名存在至少一种差异,确定所述第一人名对应的第一分值与各所述差异对应第二分值的和的差为所述同义词信息对应的相关分值。In response to at least one difference between the synonym information and the corresponding first person name, determining the difference between the first score corresponding to the first person name and the sum of the second scores corresponding to the differences as the synonym Correlation score corresponding to the information.
在一种可能的实现方式中,所述根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度包括:In a possible implementation manner, the determining the second similarity between each of the candidate articles and the first article according to the first address information and each of the second address information includes:
计算所述第一地址信息和各所述第二地址信息的编辑距离,得到各所述候选文章与所述第一文章的所述第二相似度。Calculate the edit distance between the first address information and each of the second address information to obtain the second similarity between each of the candidate articles and the first article.
在一种可能的实现方式中,所述根据对应的匹配值在各所述候选文章中确定第二文章包括:In a possible implementation manner, the determining the second article in each of the candidate articles according to the corresponding matching value includes:
确定对应匹配值最大的第二预设数量个候选文章为第二文章。A second preset number of candidate articles corresponding to the largest matching value is determined as the second article.
在一种可能的实现方式中,所述根据对应的匹配值在各所述候选文章中确定第二文章包括:In a possible implementation manner, the determining the second article in each of the candidate articles according to the corresponding matching value includes:
确定所述对应的匹配值大于匹配阈值的候选文章为所述第二文章。It is determined that the candidate article whose corresponding matching value is greater than a matching threshold is the second article.
在一种可能的实现方式中,所述属性信息还包括文章属性,所述根据所述属性信息确定所述第一文章对应的搜索信息包括:In a possible implementation manner, the attribute information further includes article attributes, and determining the search information corresponding to the first article according to the attribute information includes:
在所述文章属性中提取特征信息,所述特征信息包括所述第一文章对应的至少一个关键词;Extract feature information from the article attributes, where the feature information includes at least one keyword corresponding to the first article;
根据所述特征信息、所述第一人名信息和第一地址信息确定所述第一文章对应的搜索信息。The search information corresponding to the first article is determined according to the characteristic information, the first name information and the first address information.
在一种可能的实现方式中,所述第一文章为专利文献,所述第二文章为学术论文。In a possible implementation manner, the first article is a patent document, and the second article is an academic paper.
在一种可能的实现方式中,所述文章属性包括说明书摘要、权利要求、说明书、专利名称、技术领域、背景技术、发明内容以及具体实施方式中的至少一种,所述第一人名信息包括至少一个发明人名称,所述第一地址包括申请人地址。In a possible implementation manner, the article attribute includes at least one of the abstract of the description, the claims, the description, the patent title, the technical field, the background technology, the content of the invention, and the specific implementation manner, and the first name information Including at least one inventor's name, the first address includes the address of the applicant.
在一种可能的实现方式中,所述第二人名信息包括至少一个论文作者名称,所述第二地址包括所述论文作者所属机构的地址。In a possible implementation manner, the second name information includes at least one name of a paper author, and the second address includes an address of an institution to which the paper author belongs.
根据本公开的第二方面,提供了一种相关文章搜索装置,所述装置包括:According to a second aspect of the present disclosure, a related article search device is provided, the device comprising:
第一文章确定模块,用于确定包括属性信息的第一文章,所述属性信息中至少包括第一人名信息和第一地址信息;The first article determination module is configured to determine a first article including attribute information, the attribute information including at least first name information and first address information;
搜索信息确定模块,用于根据所述属性信息确定所述第一文章对应的搜索信息;a search information determination module, configured to determine search information corresponding to the first article according to the attribute information;
文章搜索模块,用于根据所述搜索信息在文章集合中搜索第一预设数量个候选文章,各所述候选文章中包括对应的第二人名信息和第二地址信息;An article search module, configured to search for a first preset number of candidate articles in the article collection according to the search information, and each of the candidate articles includes corresponding second name information and second address information;
数据匹配模块,用于根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值;A data matching module, configured to determine the matching value of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information;
第二文章确定模块,用于根据各所述候选文章的匹配值在各所述候选文章中确定第二文章。The second article determining module is configured to determine a second article in each of the candidate articles according to the matching value of each of the candidate articles.
在一种可能的实现方式中,所述数据匹配模块包括:In a possible implementation manner, the data matching module includes:
第一相似度确定子模块,用于根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度;The first similarity determining submodule is used to determine the first similarity between each of the candidate articles and the first article according to the first person name information and each of the second person name information;
第二相似度确定子模块,用于根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度;The second similarity determining submodule is used to determine the second similarity between each of the candidate articles and the first article according to the first address information and each of the second address information;
匹配值确定子模块,用于根据各所述候选文章对应的第一相似度和第二相似度确定匹配值。The matching value determination submodule is used to determine the matching value according to the first similarity degree and the second similarity degree corresponding to each of the candidate articles.
在一种可能的实现方式中,所述第一相似度确定子模块包括:In a possible implementation manner, the first similarity determining submodule includes:
人名确定单元,用于确定所述第一人名信息中包括的至少一个第一人名,和各所述第二人名信息中包括的至少一个第二人名;a name determination unit, configured to determine at least one first name included in the first name information, and at least one second name included in each of the second name information;
同义词确定单元,用于对于各所述第一人名,确定对应的至少一个同义。词信息,以及各所述同义词信息对应的相关分值;A synonym determining unit, configured to determine at least one corresponding synonym for each of the first person names. word information, and the relevant scores corresponding to each of the synonym information;
人名关联单元,用于对于各所述第二人名信息,确定其中包括的各所述第二人名与各所述同义词信息的关系;A name association unit, configured to determine the relationship between each of the second names and each of the synonym information included in each of the second name information;
第一相似度确定单元,用于根据与各所述第二人名对应的同义词信息的相关分值和确定第一相似度。The first similarity determination unit is configured to determine the first similarity according to the correlation score sum of the synonym information corresponding to each of the second names.
在一种可能的实现方式中,所述同义词确定单元包括:In a possible implementation manner, the synonym determining unit includes:
同义词确定子单元,用于确定各所述第一人名对应的至少一个同义词信息;a synonym determining subunit, configured to determine at least one synonym information corresponding to each of the first person names;
相关分值确定子单元,用于通过对比各所述同义词信息与对应第一人名的差异确定所述同义词信息对应的相关分值。The relevant score determination subunit is configured to determine the relevant score corresponding to the synonym information by comparing the difference between each synonym information and the corresponding first person name.
在一种可能的实现方式中,所述同义词确定子单元包括:In a possible implementation manner, the synonym determining subunit includes:
英文信息确定子单元,用于响应于所述第一人名为英文人名,确定所述英文人名包括的英文姓和英文名;An English information determining subunit, configured to determine the English surname and English name included in the English name in response to the first person being an English name;
同义词生成子单元,用于将各所述第一人名对应的英文姓和英文名分别输入预设的多个姓名格式模板,得到所述对应的同义词信息。The synonym generation subunit is used to input the English surname and English name corresponding to each first person's name into a plurality of preset name format templates to obtain the corresponding synonym information.
在一种可能的实现方式中,所述同义词确定子单元还包括:In a possible implementation manner, the synonym determining subunit further includes:
中文信息确定子单元,用于响应于所述第一人名为中文人名,对所述中文人名进行文字转换得到至少一个对应的同义词信息;A Chinese information determination subunit, configured to perform text conversion on the Chinese name to obtain at least one corresponding synonym information in response to the first person name being a Chinese name;
或者,确定所述中文人名包括的中文姓和中文名;Alternatively, determine the Chinese surname and Chinese name included in the Chinese name;
英文信息转换子单元,用于通过拼音转换的方式将所述中文姓和中文名转换为英文姓和英文名。The English information conversion subunit is used to convert the Chinese surname and Chinese name into English surname and English name through pinyin conversion.
在一种可能的实现方式中,所述相关分值确定子单元包括:In a possible implementation manner, the relevant score determination subunit includes:
第一分值确定单元,用于确定各所述第一人名对应的第一分值;a first score determination unit, configured to determine the first score corresponding to each of the first names;
第二分值确定子单元,用于确定多种预设的差异分别对应的第二分值;The second score determination subunit is used to determine the second scores corresponding to various preset differences;
相关分值计算子单元,用于响应于所述同义词信息与对应的第一人名存在至少一种差异,确定所述第一人名对应的第一分值与各所述差异对应的第二分值的和的差为所述同义词信息对应的相关分值。A correlation score calculation subunit, configured to determine the first score corresponding to the first name and the second score corresponding to each of the differences in response to at least one difference between the synonym information and the corresponding first name. The difference between the sum of the scores is the relevant score corresponding to the synonym information.
在一种可能的实现方式中,所述第二相似度确定子模块包括:In a possible implementation manner, the second similarity determining submodule includes:
第二相似度确定单元,用于计算所述第一地址信息和各所述第二地址信息的编辑 距离,得到各所述候选文章与第一文章的第二相似度。The second similarity determination unit is configured to calculate the edit distance between the first address information and each of the second address information, and obtain the second similarity between each of the candidate articles and the first article.
在一种可能的实现方式中,所述第二文章确定模块包括:In a possible implementation manner, the second article determination module includes:
第一筛选子模块,用于确定所述对应的匹配值最大的第二预设数量个候选文章为所述第二文章。The first screening submodule is configured to determine a second preset number of candidate articles with the largest corresponding matching value as the second article.
在一种可能的实现方式中,所述第二文章确定模块包括:In a possible implementation manner, the second article determination module includes:
第二筛选子模块,用于确定所述对应的匹配值大于匹配阈值的候选文章为所述第二文章。The second screening submodule is configured to determine that the candidate article whose corresponding matching value is greater than a matching threshold is the second article.
在一种可能的实现方式中,所述属性信息还包括文章属性,所述搜索信息确定模块包括:In a possible implementation manner, the attribute information further includes article attributes, and the search information determining module includes:
特征提取子模块,用于在所述文章属性中提取特征信息,所述特征信息包括所述第一文章对应的至少一个关键词;A feature extraction submodule, configured to extract feature information from the article attributes, where the feature information includes at least one keyword corresponding to the first article;
搜索信息确定子模块,用于根据所述特征信息、所述第一人名信息和第一地址信息确定所述第一文章对应的搜索信息。The search information determination submodule is configured to determine the search information corresponding to the first article according to the feature information, the first name information and the first address information.
在一种可能的实现方式中,所述第一文章为专利文献,所述第二文章为学术论文。In a possible implementation manner, the first article is a patent document, and the second article is an academic paper.
在一种可能的实现方式中,所述文章属性包括说明书摘要、权利要求、说明书、专利名称、技术领域、背景技术、发明内容以及具体实施方式中的至少一种,所述第一人名信息包括至少一个发明人名称,所述第一地址包括申请人地址。In a possible implementation manner, the article attribute includes at least one of the abstract of the description, the claims, the description, the patent title, the technical field, the background technology, the content of the invention, and the specific implementation manner, and the first name information Including at least one inventor's name, the first address includes the address of the applicant.
在一种可能的实现方式中,所述第二人名信息包括至少一个论文作者名称,所述第二地址包括所述论文作者所属机构的地址。In a possible implementation manner, the second name information includes at least one name of a paper author, and the second address includes an address of an institution to which the paper author belongs.
根据本公开的第三方面,提供了一种电子设备,包括:According to a third aspect of the present disclosure, an electronic device is provided, including:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为调用所述存储器存储的指令,以执行如上任意一项所述的方法。Wherein, the processor is configured to call the instructions stored in the memory to execute the method described in any one of the above.
根据本公开的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现如上任意一项所述的方法。According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method described in any one of the above items is implemented.
本公开实施例能够通过第一文章中属性信息搜索相关的候选文章,并根据其中的人名和地址与候选文章建立关联,得到相关度高的第二文章,提高了相关文章搜索结果的准确性。The embodiments of the present disclosure can search for relevant candidate articles through the attribute information in the first article, and establish associations with the candidate articles according to the person's name and address therein to obtain a second article with high correlation, which improves the accuracy of the search results for related articles.
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图简要说明Brief description of the drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the specification, serve to explain the principles of the disclosure.
图1示出根据本公开实施例的一种相关文章搜索方法的流程图;FIG. 1 shows a flowchart of a method for searching related articles according to an embodiment of the present disclosure;
图2示出根据本公开实施例的确定搜索信息过程的示意图;FIG. 2 shows a schematic diagram of a process of determining search information according to an embodiment of the present disclosure;
图3示出根据本公开实施例的确定候选文章过程的示意图;Fig. 3 shows a schematic diagram of the process of determining candidate articles according to an embodiment of the present disclosure;
图4示出根据本公开实施例的确定各候选文章匹配值的流程图;Fig. 4 shows a flow chart of determining the matching value of each candidate article according to an embodiment of the present disclosure;
图5示出根据本公开实施例的确定第一人名同义词信息过程的示意图;Fig. 5 shows a schematic diagram of a process of determining synonyms information of a first person's name according to an embodiment of the present disclosure;
图6示出根据本公开实施例的一种相关文章搜索装置的示意图;Fig. 6 shows a schematic diagram of a device for searching related articles according to an embodiment of the present disclosure;
图7示出根据本公开实施例的一种电子设备的示意图;FIG. 7 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure;
图8示出根据本公开实施例的另一种电子设备的示意图。Fig. 8 shows a schematic diagram of another electronic device according to an embodiment of the present disclosure.
实施本发明的方式Modes of Carrying Out the Invention
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
另外,为了更好的说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art have not been described in detail so as to obscure the gist of the present disclosure.
图1示出根据本公开实施例的一种相关文章搜索方法的流程图。在一种可能的实现方式中,本公开实施例的相关文章搜索方法可以由终端设备或其它处理设备执行,其中,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。其他处理设备可以为服务器,例如单独的服务器或多个服务器组成的服务器集群。在一些可能的实现方式中,该相关文章搜索方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。Fig. 1 shows a flowchart of a method for searching related articles according to an embodiment of the present disclosure. In a possible implementation, the method for searching related articles in this embodiment of the present disclosure may be executed by a terminal device or other processing device, where the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal , cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. Other processing devices may be servers, such as a single server or a server cluster composed of multiple servers. In some possible implementation manners, the method for searching related articles may be implemented by a processor invoking computer-readable instructions stored in a memory.
在一个示例性的应用场景中,可以通过对预先确定的专利文献执行本公开实施例的相关文章搜索方法,在搜索到多篇相关的论文后,根据专利文献中的发明人人名与申请人地址,以及论文中作者人名和地址建立关联,在搜索得到的多篇论文中准确地关联出与专利文献匹配的论文。In an exemplary application scenario, by performing the related article search method of the embodiment of the present disclosure on the predetermined patent documents, after searching for multiple related papers, according to the inventor's name and applicant's address in the patent documents , and the name and address of the author in the paper are associated, and the papers that match the patent documents are accurately associated among the multiple papers obtained by the search.
如图1所示,本公开实施例的相关文章搜索方法包括以下步骤:As shown in Figure 1, the related article search method of the embodiment of the present disclosure includes the following steps:
S10、确定包括属性信息的第一文章。S10. Determine the first article including attribute information.
在一种可能的实现方式中,第一文章为待检索相关文章的文章,其中包括属性信息。在不同的应用场景中第一文章可以为不同的文章。例如,在搜索专利相关的论文时,第一文章可以为专利文献;在搜索论文相关的专利时,第一文章可以为学术论文;在搜索新闻资讯相关的相关专栏文章时,第一文章可以为新闻资讯。In a possible implementation manner, the first article is an article of related articles to be retrieved, which includes attribute information. The first article may be different articles in different application scenarios. For example, when searching for papers related to patents, the first article can be patent documents; when searching for patents related to papers, the first article can be academic papers; when searching for relevant column articles related to news information, the first article can be news.
可选地,属性信息为第一文章中用于表征第一文章属性特征的内容,至少包括第一人名信息和第一地址信息。其中,第一人名信息用于表征第一文章的相关人员名称,第一地址信息用于表征第一文章的相关地址。进一步地,对于不同类型的第一文章,对应的相关人员和相关地址不同。例如在第一文章为学术论文时,相关人员可以为论文的作者,相关地址为论文作者所属机构的地址。在第一文章为专利文献时,相关人员可以为专利文献的发明人,相关地址可以为专利文献的申请人地址。由于文章的作者、专利的发明人和申请人可以为一个或多个人,即第一人名信息中可以包括第一文章相关的至少一个第一人名,第一地址信息中可以包括第一文章相关的至少一个地址信息。Optionally, the attribute information is the content in the first article used to characterize the attribute characteristics of the first article, including at least first name information and first address information. Wherein, the first name information is used to represent the name of the relevant person of the first article, and the first address information is used to represent the relevant address of the first article. Further, for different types of first articles, the corresponding related persons and related addresses are different. For example, when the first article is an academic paper, the relevant person can be the author of the paper, and the relevant address is the address of the institution to which the author of the paper belongs. When the first article is a patent document, the relevant person may be the inventor of the patent document, and the relevant address may be the address of the applicant of the patent document. Since the author of the article, the inventor of the patent, and the applicant can be one or more persons, that is, the first name information can include at least one first name related to the first article, and the first address information can include the first article Associated at least one address information.
进一步地,属性信息中还可以包括其他用于表征第一文章特征的文章属性。例如,在第一文章为学术论文时,文章属性可以包括第一文章的名称和摘要。在第一文章为专利文献时,文章属性可以包括第一文章的说明书摘要、权利要求和说明书中的至少一种。可选地,在文章属性包括说明书时,可以包括说明书的部分内容或完整内容,说明书部分内容可以包括专利名称、技术领域、背景技术、发明内容以及具体实施方式。在第一文章为短篇小说时,文章属性可以包括名称和序言。Further, the attribute information may also include other article attributes used to characterize the characteristics of the first article. For example, when the first article is an academic paper, the article attributes may include the title and abstract of the first article. When the first article is a patent document, the article attribute may include at least one of an abstract, a claim, and a description of the first article. Optionally, when the attribute of the article includes a description, it may include part or complete content of the description, and part of the content of the description may include the patent name, technical field, background technology, content of the invention and specific implementation methods. When the first article is a short story, the article attributes may include a title and a preamble.
步骤S20、根据所述属性信息确定所述第一文章对应的搜索信息。Step S20, determining search information corresponding to the first article according to the attribute information.
在一种可能的实现方式中,根据第一文章中的属性信息确定对应的搜索信息,该搜索信息表征第一文章对应的文章特征,用于搜索第一文章相关的候选文章。In a possible implementation manner, corresponding search information is determined according to attribute information in the first article, the search information characterizes article features corresponding to the first article, and is used to search for candidate articles related to the first article.
可选地,确定搜索信息的方式可以为在文章属性中提取特征信息,特征信息包括 所述第一文章对应的至少一个关键词。再根据特征信息、第一人名信息和第一地址信息确定搜索信息。例如,在文章属性中提取至少一个关键词作为特征信息,在第一文章为专利文献时,特征信息包括从说明书摘要、权利要求、说明书、专利名称、技术领域、背景技术、发明内容以及具体实施方式至少一种中提取得到的多个关键词。在第一文章为学术论文时,特征信息包括从论文摘要、论文标题以及论文正文至少一种中提取得到的多个关键词。再将特征信息、第一人名信息和第一地址信息一同作为搜索词,确定包括上述各搜索词的搜索信息。Optionally, the manner of determining the search information may be to extract characteristic information from article attributes, and the characteristic information includes at least one keyword corresponding to the first article. Then, the search information is determined according to the feature information, the first name information and the first address information. For example, at least one keyword is extracted from the article attributes as characteristic information. When the first article is a patent document, the characteristic information includes abstracts from the description, claims, description, patent title, technical field, background technology, content of the invention and specific implementation A plurality of keywords extracted from at least one method. When the first article is an academic paper, the characteristic information includes a plurality of keywords extracted from at least one of the abstract of the paper, the title of the paper, and the text of the paper. Then, the characteristic information, the first person's name information and the first address information are used together as search words to determine the search information including the above search words.
图2示出根据本公开实施例的确定搜索信息过程的示意图。如图2所示,在确定包括文章属性21、第一人名信息23和第一地址信息22的属性信息20后,通过提取文章属性21中的关键词得到对应的特征信息24。根据第一人名信息23、第一地址信息22和特征信息24确定搜索信息25。Fig. 2 shows a schematic diagram of a process of determining search information according to an embodiment of the present disclosure. As shown in FIG. 2 , after determining the attribute information 20 including the article attribute 21 , the first name information 23 and the first address information 22 , the corresponding characteristic information 24 is obtained by extracting keywords in the article attribute 21 . The search information 25 is determined according to the first name information 23 , the first address information 22 and the feature information 24 .
步骤S30、根据所述搜索信息在文章集合中搜索第一预设数量个候选文章。Step S30, searching for a first preset number of candidate articles in the article collection according to the search information.
在一种可能的实现方式中,以步骤S20对应的搜索信息,在预设的包括多个文章的文章集合中搜索,得到第一预设数量个候选文章。可选地,本公开实施例的搜索方法可以为根据对应的搜索信息,通过ES(elasticsearch)搜索引擎搜索文章集合,得到第一预设数量个候选文章。ES搜索引擎的检索过程为将搜索信息中的内容拆分存储,形成对应的检索索引后进行匹配检索。In a possible implementation manner, the search information corresponding to step S20 is used to search in a preset article collection including a plurality of articles to obtain a first preset number of candidate articles. Optionally, the search method in this embodiment of the present disclosure may be to search the article collection through an ES (elasticsearch) search engine according to the corresponding search information, and obtain a first preset number of candidate articles. The retrieval process of the ES search engine is to split and store the content in the search information, form a corresponding retrieval index, and perform matching retrieval.
图3示出根据本公开实施例的确定候选文章过程的示意图。如图3所示,在确定第一文章30后,根据其中的属性信息确定搜索信息31。进一步地,根据该搜索信息31在预设的文章集合32中搜索得到N个文章作为候选文章33。Fig. 3 shows a schematic diagram of a process of determining candidate articles according to an embodiment of the present disclosure. As shown in FIG. 3 , after the first article 30 is determined, the search information 31 is determined according to the attribute information therein. Further, according to the search information 31 , N articles are searched in the preset article collection 32 as candidate articles 33 .
进一步地,检索得到的各候选文章中均包括对应的第二人名信息和第二地址信息。其中,第二人名信息和第二地址信息分别用于表征候选文章的相关人员名称和相关地址。进一步地,对于不同类型的候选文章,对应的相关人员和相关地址也不同。例如在候选文章为学术论文时,相关人员可以为论文的作者,相关地址为论文作者所属机构的地址。在候选文章为专利文献时,相关人员可以为专利文献的发明人,相关地址可以为专利文献的申请人地址。由于文章的作者、专利的发明人和申请人可以为一个或多个人,即第二人名信息中可以包括候选文章相关的至少一个第二人名,第二地址信息中可以包括候选文章相关的至少一个地址信息。Further, each candidate article retrieved includes corresponding second name information and second address information. Wherein, the second name information and the second address information are respectively used to represent the name and address of the relevant person of the candidate article. Further, for different types of candidate articles, the corresponding relevant persons and relevant addresses are also different. For example, when the candidate article is an academic paper, the relevant person can be the author of the paper, and the relevant address is the address of the institution to which the author of the paper belongs. When the candidate article is a patent document, the relevant person can be the inventor of the patent document, and the relevant address can be the address of the applicant of the patent document. Since the author of the article, the inventor of the patent, and the applicant can be one or more persons, that is, the second name information can include at least one second name related to the candidate article, and the second address information can include at least one person related to the candidate article. Address information.
步骤S40、根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值。Step S40: Determine the matching value of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information.
在一种可能的实现方式中,通过第一文章的相关人员和相关地址,以及候选文章的相关人员和相关地址建立关联,得到第一文章和各候选文章的匹配值。也就是说,可以根据第一人名信息和第二人名信息的匹配程度,以及第一地址信息与第二地址信息的匹配程度,得到对应候选文章与第一文章的匹配值。In a possible implementation manner, the matching values of the first article and each candidate article are obtained by establishing associations between the relevant persons and relevant addresses of the first article and the relevant persons and relevant addresses of the candidate articles. That is to say, the matching value between the corresponding candidate article and the first article can be obtained according to the degree of matching between the first name information and the second name information, and the matching degree between the first address information and the second address information.
图4示出根据本公开实施例的确定各候选文章匹配值的流程图。如图4所示,在一种可能的实现方式中,确定各候选文章匹配值的过程包括以下步骤:Fig. 4 shows a flow chart of determining the matching value of each candidate article according to an embodiment of the present disclosure. As shown in Figure 4, in a possible implementation, the process of determining the matching value of each candidate article includes the following steps:
步骤S41、根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度。Step S41 , according to the first name information and each second name information, determine the first similarity between each candidate article and the first article.
在一种可能的实现方式中,可以根据第一文章相关人员的第一人名信息,以及各候选文章相关人员的第二人名信息进行匹配,得到各候选文章和第一文章的第一相似度。也就是说,该第一相似度用于表征对应文章相关人员匹配程度,例如两篇文章是同一作者的可能性,或者两篇文章用于描述同一人员相关事件或信息的可能性。In a possible implementation, the first similarity between each candidate article and the first article can be obtained by matching according to the first name information of the person related to the first article and the second name information of the person related to each candidate article . That is to say, the first similarity is used to characterize the matching degree of related persons of corresponding articles, for example, the possibility that two articles are the same author, or the possibility that two articles are used to describe events or information related to the same person.
可选地,确定各候选文章和第一文章的第一相似度的过程还可以包括:确定第一人名信息中包括的至少一个第一人名,和各第二人名信息中包括的至少一个第二人名。对于各第一人名,确定对应的至少一个同义词信息,以及各同义词信息对应的相关分值。对于各第二人名信息,确定其中包括的各第二人名与各同义词信息的关系,根据与各第 二人名对应的同义词信息的相关分值和确定第一相似度。Optionally, the process of determining the first similarity between each candidate article and the first article may also include: determining at least one first name included in the first name information, and at least one name included in each second name information. second name. For each first person's name, at least one corresponding synonym information and a related score corresponding to each synonym information are determined. For each second person's name information, determine the relationship between each second person's name and each synonym information included therein, and determine the first similarity according to the correlation score sum of the synonym information corresponding to each second person's name.
也就是说,确定第一人名信息中各第一人名的同义词信息,当第二人名信息中的第二人名与一个同义词信息相同时,确定其对应的相关分值。计算第二人名信息中各第二人名对应的相关分值的和得到第一相似度。That is to say, the synonym information of each first name in the first name information is determined, and when the second name in the second name information is the same as a synonym information, its corresponding correlation score is determined. The sum of the correlation scores corresponding to the second names in the second person name information is calculated to obtain the first similarity.
在一种可能的实现方式中,确定第一人名信息中各第一人名同义词信息的过程包括:确定各第一人名对应的至少一个同义词信息,通过对比各同义词信息与对应第一人名的差异确定相关分值。In a possible implementation manner, the process of determining the synonym information of each first person name in the first person name information includes: determining at least one synonym information corresponding to each first person name, and comparing each synonym information with the corresponding first person name The difference in names determines the correlation score.
可选地,由于在不同文字类型的文章中人名的表现方式不同,该第一人名对应的同义词信息可以根据第一人名的文字类型确定,例如同一英文人名可以通过不同的格式表示。在一种可能的实现方式中,可以响应于第一人名为英文人名,确定其中包括的英文姓和英文名,将各第一人名对应的英文姓和英文名分别输入预设的多个姓名格式模板,得到对应的同义词信息。Optionally, since names of people are expressed in different ways in articles of different text types, the synonym information corresponding to the first person name may be determined according to the text type of the first person name, for example, the same English name may be represented in different formats. In a possible implementation, in response to the first person's English name, the English surname and English name included in it may be determined, and the English surname and English name corresponding to each first person's name are respectively input into a plurality of preset Name format template to get the corresponding synonym information.
也就是说,在第一人名为英文人名的情况下,划分该英文人名得到英文名和英文姓。同时,预先确定多个英文人名可能的表现格式,例如“姓·名”、“名·姓”、“名,姓”以及“名·姓的首字母”等,将各表现格式确定为对应的姓名格式模板。进一步地,将划分第一人名得到的英文名和英文姓输入各姓名格式模板,得到对应的同义词信息。可选地,该同义词信息中包括与对应第一人名完全相同的信息。That is to say, when the first person is an English name, the English name is divided to obtain an English name and an English surname. At the same time, predetermine the possible expression formats of multiple English names, such as "surname · first name", "first name · surname", "first name, surname" and "first letter of first name · surname", etc., and determine each expression format as the corresponding Name format template. Further, input the English name and English surname obtained by dividing the first person's name into each name format template to obtain corresponding synonym information. Optionally, the synonym information includes exactly the same information as the corresponding first person name.
例如,对于第一人名“SORIN FABIEN”。划分得到英文姓“SORIN”和英文名“FABIEN”,将其分别输入预设的多个姓名格式模板,得到对应的同义词信息。在各姓名格式模板分别为“名·姓”、“名·姓”、“名,姓”以及“名·姓的首字母”时,分别输出的同义词信息为“FABIEN SORIN”、“FABIEN·SORIN”、“FABIEN,SORIN”和“FABIEN,S”。For example, for the first person name "SORIN FABIEN". The English surname "SORIN" and the English name "FABIEN" are obtained by dividing them into the preset multiple name format templates to obtain the corresponding synonym information. When each name format template is "first name·surname", "first name·surname", "first name, last name" and "first letter of first name·surname", the synonym information output respectively is "FABIEN SORIN", "FABIEN·SORIN ", "FABIEN, SORIN" and "FABIEN, S".
进一步地,为了满足中文名称的相关人员用拼音格式的英文名称发表其他文章的情况,可以在第一人名的文字类型为中文时,通过将其转换为对应的拼音得到英文名称,进一步确定对应的同义词信息。也就是说,响应于第一人名为中文人名,确定其中包括的中文姓和中文名,通过拼音转换的方式将中文姓和中文名转换为英文姓和英文名。Furthermore, in order to meet the situation that the relevant personnel with Chinese names publish other articles with English names in pinyin format, when the text type of the first person’s name is Chinese, the English name can be obtained by converting it into the corresponding pinyin to further determine the corresponding synonym information. That is to say, in response to the first person being named in Chinese, the Chinese surname and Chinese name included therein are determined, and the Chinese surname and Chinese name are converted into English surname and English name through pinyin conversion.
例如,对于第一人名“张三”。划分得到中文姓“张”和中文名“三”,将其转换为对应的英文姓“Zhang”,英文名“San”。进一步将该英文姓“Zhang”,英文名“San”分别输入预设的多个姓名格式模板,得到对应的同义词信息。For example, for the first person name "Zhang San". The Chinese surname "Zhang" and the Chinese name "San" are divided into the corresponding English surname "Zhang" and English name "San". Further input the English surname "Zhang" and the English name "San" into a plurality of preset name format templates respectively to obtain corresponding synonym information.
可选地,还可以响应于第一人名为中文人名,通过文字转换的方式得到至少一个对应的同义词信息。例如,将第一人名由转换为繁体字,或者由繁体字转换为简体字。例如,当第一人名为简体字“张三”时,可以转换为繁体字得到同义词信息“張三”。Optionally, at least one corresponding synonym information may also be obtained through text conversion in response to the first person's name in Chinese. For example, convert the first person's name from traditional to traditional, or from traditional to simplified. For example, when the first person's name is "Zhang San" in simplified characters, it can be converted into traditional characters to obtain the synonym information "Zhang San".
在一种可能的实现方式中,在确定同义词信息后,根据各同义词信息与对应第一人名的差异确定相关分值可以包括:确定各第一人名对应的第一分值,确定多种预设的差异分别对应的第二分值,响应于同义词信息与对应的第一人名存在至少一种差异,确定对应第一人名的第一分值与各差异对应第二分值的和的差为相关分值。In a possible implementation, after determining the synonym information, determining the relevant score according to the difference between each synonym information and the corresponding first person name may include: determining the first score corresponding to each first person name, determining a variety of The preset difference corresponds to the second score, and in response to at least one difference between the synonym information and the corresponding first name, determine the sum of the first score corresponding to the first name and the second score corresponding to each difference The difference is the correlation score.
其中,第一分值为同义词信息与第一人名完全一致时的分值,第二分值用于表征第一人名与同义词信息有对应差异时损失的分值,相关分值为根据第一分值和第二分值计算得到的,用于表征同义词信息和第一人名相关程度分值。可选地,差异可以根据各姓名格式模板确定,例如可以包括大小写不同、姓缩写为首字母、去除前缀、去除后缀、去除.、去除,以及保留名等。也就是说,在同义词信息与对应的人名存在至少一种差异的情况下,确定各差异对应的第二分值和,计算第一分值与第二分值和的差异得到相关分值。Among them, the first score is the score when the synonym information is completely consistent with the first person's name, and the second score is used to represent the loss when the first person's name and the synonym information have corresponding differences. The first score and the second score are calculated, and are used to represent the synonym information and the first name correlation score. Optionally, the difference can be determined according to each name format template, for example, it can include different capitalization, abbreviated surnames, removing prefixes, removing suffixes, removing ., removing , and reserved names, etc. That is to say, if there is at least one difference between the synonym information and the corresponding person name, determine the second sum of scores corresponding to each difference, and calculate the difference between the first score and the second sum of scores to obtain the relevant score.
例如,可以设定第一分值为1,姓缩写为首字母的差异对应的第二分值为0.3,大小写不同的差异对应的第二分值为0.2,增加·的差异对应的第二分值为0.1。当第一人名为“SORIN FABIEN”,同义词信息1为“Sorin Fabien”,同义词信息2为“S·FABIEN”, 同义词信息3为“Sorin·Fabien”时,第一人名与同义词信息1的相关分值为0.8、与同义词信息2的相关分值为0.6,与同义词信息3的相关分值为0.7。For example, the first score can be set to 1, the second score corresponding to the difference between initials and surnames is 0.3, the second score corresponding to the difference in upper and lower case is 0.2, and the second score corresponding to the difference of increasing · The value is 0.1. When the first person's name is "SORIN FABIEN", the synonym information 1 is "Sorin Fabien", the synonym information 2 is "S·FABIEN", and the synonym information 3 is "Sorin·Fabien", the first person's name and synonym information 1 The correlation score is 0.8, the correlation score with synonym information 2 is 0.6, and the correlation score with synonym information 3 is 0.7.
进一步地,在确定第二人名信息中包括的各第二人名对应的同义词信息后,计算各同义词的相关分值的和得到对应的候选文章与第一文章的第一相似度。Further, after determining the synonym information corresponding to each second name included in the second name information, the sum of the correlation scores of each synonym is calculated to obtain the first similarity between the corresponding candidate article and the first article.
图5示出根据本公开实施例的确定第一人名同义词信息的示意图。如图5所示,对于第一人名信息中的各第一人名50,先确定其对应的文字类型51。在第一人名50对应的文字类型51为英文时,划分第一人名50得到英文名和英文姓52,根据英文名和英文姓52以及多个预设的姓名格式模板54确定多个对应的同义词信息55。在第一人名50对应的文字类型51为中文时,划分第一人名50得到中文名和中文姓53,将中文名和中文姓53转换为对应的英文名和英文姓52,再根据英文名和英文姓52以及多个预设的姓名格式模板54确定多个对应的同义词信息55。Fig. 5 shows a schematic diagram of determining the synonym information of the first person's name according to an embodiment of the present disclosure. As shown in FIG. 5 , for each first name 50 in the first name information, its corresponding character type 51 is determined first. When the character type 51 corresponding to the first person's name 50 is English, divide the first person's name 50 to obtain the English name and English surname 52, and determine a plurality of corresponding synonyms according to the English name and English surname 52 and a plurality of preset name format templates 54 Information 55. When the character type 51 corresponding to the first person's name 50 is Chinese, the first person's name 50 is divided to obtain the Chinese name and Chinese surname 53, and the Chinese name and Chinese surname 53 are converted into corresponding English names and English surnames 52, and then according to the English name and English surname 52 and a plurality of preset name format templates 54 determine a plurality of corresponding synonym information 55 .
在一种可能的实现方式中,还可以在第一人名50的文字类型为中文时,直接通过简繁体转换等文字转换方式,确定第一人名50中包括的中文姓和中文名53的另一种中文书写方式,得到对应的同义词信息55。In a possible implementation, when the text type of the first person's name 50 is Chinese, the characters of the Chinese surname and Chinese name 53 included in the first person's name 50 can be determined directly through a text conversion method such as conversion between simplified and traditional characters. In another Chinese writing method, the corresponding synonym information 55 is obtained.
步骤S42、根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度。Step S42, according to the first address information and each of the second address information, determine the second similarity between each of the candidate articles and the first article.
在一种可能的实现方式中,各所述候选文章与第一文章的第二相似度用于表征对应相关地址的相关程度。可选地,该第二相似度的确定方式可以为计算第一地址信息和各第二地址信息的编辑距离,得到各候选文章与第一文章的第二相似度。也就是说,将各候选文章与第一文章之间的编辑距离作为对应的第二相似度。In a possible implementation manner, the second degree of similarity between each candidate article and the first article is used to characterize the degree of relevance of the corresponding relevant addresses. Optionally, the second similarity may be determined by calculating the edit distance between the first address information and each second address information to obtain the second similarity between each candidate article and the first article. That is to say, the edit distance between each candidate article and the first article is used as the corresponding second similarity.
进一步地,当第一地址信息中包括多个第一地址,第二地址信息中包括多个第二地址时,分别计算各第一地址和各第二地址的编辑距离,并计算加权和后得到对应的第二相似度。Further, when the first address information includes multiple first addresses and the second address information includes multiple second addresses, respectively calculate the edit distance of each first address and each second address, and calculate the weighted sum to obtain The corresponding second similarity.
步骤S43、根据各所述候选文章对应的第一相似度和第二相似度确定匹配值。Step S43 , determining a matching value according to the first similarity degree and the second similarity degree corresponding to each of the candidate articles.
在一种可能的实现方式中,可以根据各候选文章对应的第一相似度和第二相似度,确定用于表征其与第一文章匹配程度的匹配值。可选地,该匹配值的计算方式可以为计算第一相似度和第二相似度的加权和。In a possible implementation manner, according to the first similarity degree and the second similarity degree corresponding to each candidate article, a matching value used to characterize the matching degree between the candidate article and the first article may be determined. Optionally, the matching value may be calculated by calculating a weighted sum of the first similarity and the second similarity.
步骤S50、根据对应的匹配值在各所述候选文章中确定第二文章。Step S50, determining a second article among each of the candidate articles according to the corresponding matching value.
在一种可能的实现方式中,在确定各候选文章对应的匹配值后,根据对应的匹配值在各候选文章中确定第二文章。可选地,该确定方式可以为确定对应匹配值最大的第二预设数量个候选文章为第二文章。或者,确定对应匹配值大于匹配阈值的候选文章为第二文章。In a possible implementation manner, after the matching values corresponding to each candidate article are determined, the second article is determined among each candidate article according to the corresponding matching value. Optionally, the determining method may be to determine a second preset number of candidate articles corresponding to the largest matching value as the second article. Alternatively, a candidate article whose corresponding matching value is greater than a matching threshold is determined as the second article.
可选地,当本公开实施例的应用场景为搜索专利文献相关的学术论文时,最终确定的第二文章为学术论文。Optionally, when the application scenario of the embodiment of the present disclosure is searching for academic papers related to patent documents, the finally determined second article is an academic paper.
本公开实施例能够通过第一文章中属性信息搜索相关的候选文章,并根据其中的人名和地址与候选文章建立关联,得到相关度高的第二文章,提高了相关文章搜索结果的准确性。其中,通过将人名划分后确定为不同表现形式的同义词信息,避免了因对应姓名表现形式不同导致搜索过程中遗漏相关文章的问题。The embodiments of the present disclosure can search for relevant candidate articles through the attribute information in the first article, and establish associations with the candidate articles according to the person's name and address therein to obtain a second article with high correlation, which improves the accuracy of the search results for related articles. Among them, by dividing names into synonym information in different manifestations, the problem of missing relevant articles in the search process due to different manifestations of corresponding names is avoided.
尽管以图1作为示例介绍了相关文章搜索方法如上,但本领域技术人员能够理解,本公开应不限于此。事实上,用户完全可根据个人喜好和/或实际应用场景灵活设定本公开实施例的相关文章搜索方法,只要能够基于人名和地址搜索相关文章,提高搜索结果准确性即可。Although the relevant article search method is described above using FIG. 1 as an example, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, users can flexibly set the related article search method in the embodiment of the present disclosure according to personal preferences and/or actual application scenarios, as long as they can search for related articles based on names and addresses and improve the accuracy of search results.
图6示出根据本公开实施例的一种相关文章搜索装置的示意图。如图6所示,该相关文章搜索装置600包括:Fig. 6 shows a schematic diagram of an apparatus for searching related articles according to an embodiment of the present disclosure. As shown in Figure 6, the related article search device 600 includes:
第一文章确定模块60,用于确定包括属性信息的第一文章,所述属性信息中至少包括第一人名信息和第一地址信息;The first article determining module 60 is configured to determine a first article including attribute information, the attribute information including at least first name information and first address information;
搜索信息确定模块61,用于根据所述属性信息确定所述第一文章对应的搜索信息;A search information determination module 61, configured to determine search information corresponding to the first article according to the attribute information;
文章搜索模块62,用于根据所述搜索信息在文章集合中搜索第一预设数量个候选文章,各所述候选文章中包括对应的第二人名信息和第二地址信息;An article search module 62, configured to search for a first preset number of candidate articles in the article collection according to the search information, and each of the candidate articles includes corresponding second name information and second address information;
数据匹配模块63,用于根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值;A data matching module 63, configured to determine the matching value of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information;
第二文章确定模块64,用于根据对应的匹配值在各所述候选文章中确定第二文章。The second article determining module 64 is configured to determine a second article in each of the candidate articles according to the corresponding matching value.
在一种可能的实现方式中,所述数据匹配模块包括:In a possible implementation manner, the data matching module includes:
第一相似度确定子模块,用于根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度;The first similarity determining submodule is used to determine the first similarity between each of the candidate articles and the first article according to the first person name information and each of the second person name information;
第二相似度确定子模块,用于根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度;The second similarity determining submodule is used to determine the second similarity between each of the candidate articles and the first article according to the first address information and each of the second address information;
匹配值确定子模块,用于根据各所述候选文章对应的第一相似度和第二相似度确定匹配值。The matching value determination submodule is used to determine the matching value according to the first similarity degree and the second similarity degree corresponding to each of the candidate articles.
在一种可能的实现方式中,所述第一相似度确定子模块包括:In a possible implementation manner, the first similarity determination submodule includes:
人名确定单元,用于确定所述第一人名信息中包括的至少一个第一人名,和各所述第二人名信息中包括的至少一个第二人名;a name determination unit, configured to determine at least one first name included in the first name information, and at least one second name included in each of the second name information;
同义词确定单元,用于对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值;A synonym determining unit, configured to determine at least one corresponding synonym information for each of the first person names, and a related score corresponding to each of the synonym information;
人名关联单元,用于对于各所述第二人名信息,确定其中包括的各所述第二人名与各所述同义词信息的关系;A name association unit, configured to determine the relationship between each of the second names and each of the synonym information included in each of the second name information;
第一相似度确定单元,用于根据与各所述第二人名对应的同义词信息的相关分值和确定第一相似度。The first similarity determination unit is configured to determine the first similarity according to the correlation score sum of the synonym information corresponding to each of the second names.
在一种可能的实现方式中,所述同义词确定单元包括:In a possible implementation manner, the synonym determining unit includes:
同义词确定子单元,用于确定各所述第一人名对应的至少一个同义词信息;a synonym determining subunit, configured to determine at least one synonym information corresponding to each of the first person names;
相关分值确定子单元,用于通过对比各所述同义词信息与对应第一人名的差异确定相关分值。The relevant score determination subunit is configured to determine the relevant score by comparing the difference between each synonym information and the corresponding first person name.
在一种可能的实现方式中,所述同义词确定子单元包括:In a possible implementation manner, the synonym determining subunit includes:
英文信息确定子单元,用于响应于所述第一人名为英文人名,确定其中包括的英文姓和英文名;An English information determining subunit, configured to determine the English surname and English name included in the first person name in response to the first person name in English;
同义词生成子单元,用于将各所述第一人名对应的英文姓和英文名分别输入预设的多个姓名格式模板,得到对应的同义词信息。The synonym generating subunit is used to input the English surname and English name corresponding to each first person's name into a plurality of preset name format templates to obtain corresponding synonym information.
在一种可能的实现方式中,所述同义词确定子单元还包括:In a possible implementation manner, the synonym determining subunit further includes:
中文信息确定子单元,用于响应于所述第一人名为中文人名,对所述中文人名进行文字转换得到至少一个对应的同义词信息;A Chinese information determination subunit, configured to perform text conversion on the Chinese name to obtain at least one corresponding synonym information in response to the first person name being a Chinese name;
或者,确定其中包括的中文姓和中文名;Alternatively, identify the Chinese surname and Chinese name included therein;
英文信息转换子单元,用于通过拼音转换的方式将所述中文姓和中文名转换为英文姓和英文名。The English information conversion subunit is used to convert the Chinese surname and Chinese name into English surname and English name through pinyin conversion.
在一种可能的实现方式中,所述相关分值确定子单元包括:In a possible implementation manner, the relevant score determination subunit includes:
第一分值确定单元,用于确定各所述第一人名对应的第一分值;a first score determination unit, configured to determine the first score corresponding to each of the first names;
第二分值确定子单元,用于确定多种预设的差异分别对应的第二分值;The second score determination subunit is used to determine the second scores corresponding to various preset differences;
相关分值计算子单元,用于响应于所述同义词信息与对应的第一人名存在至少一种差异,确定对应第一人名的第一分值与各所述差异对应第二分值和的差为相关分值。A correlation score calculation subunit, configured to determine the sum of the first score corresponding to the first name and the second score corresponding to each of the differences in response to at least one difference between the synonym information and the corresponding first name. The difference is the correlation score.
在一种可能的实现方式中,所述第二相似度确定子模块包括:In a possible implementation manner, the second similarity determining submodule includes:
第二相似度确定单元,用于计算所述第一地址信息和各所述第二地址信息的编辑距离,得到各所述候选文章与第一文章的第二相似度。The second similarity determining unit is configured to calculate an edit distance between the first address information and each of the second address information to obtain a second similarity between each of the candidate articles and the first article.
在一种可能的实现方式中,所述第二文章确定模块包括:In a possible implementation manner, the second article determination module includes:
第一筛选子模块,用于确定对应匹配值最大的第二预设数量个候选文章为第二文章。The first screening submodule is configured to determine a second preset number of candidate articles corresponding to the largest matching value as the second article.
在一种可能的实现方式中,所述第二文章确定模块包括:In a possible implementation manner, the second article determination module includes:
第二筛选子模块,用于确定对应匹配值大于匹配阈值的候选文章为第二文章。The second screening submodule is configured to determine that candidate articles whose corresponding matching values are greater than the matching threshold are the second articles.
在一种可能的实现方式中,所述属性信息还包括文章属性,所述搜索信息确定模块包括:In a possible implementation manner, the attribute information further includes article attributes, and the search information determining module includes:
特征提取子模块,用于在所述文章属性中提取特征信息,所述特征信息包括所述第一文章对应的至少一个关键词;A feature extraction submodule, configured to extract feature information from the article attributes, where the feature information includes at least one keyword corresponding to the first article;
搜索信息确定子模块,用于根据所述特征信息、第一人名信息和第一地址信息确定搜索信息。The search information determining submodule is configured to determine search information according to the feature information, first name information and first address information.
在一种可能的实现方式中,所述第一文章为专利文献,所述第二文章为学术论文。In a possible implementation manner, the first article is a patent document, and the second article is an academic paper.
在一种可能的实现方式中,所述文章属性包括说明书摘要、权利要求、说明书、专利名称、技术领域、背景技术、发明内容以及具体实施方式中的至少一种,所述第一人名信息包括至少一个发明人名称,所述第一地址包括申请人地址。In a possible implementation manner, the article attribute includes at least one of the abstract of the description, the claims, the description, the patent title, the technical field, the background technology, the content of the invention, and the specific implementation manner, and the first name information Including at least one inventor's name, the first address includes the address of the applicant.
在一种可能的实现方式中,所述第二人名信息包括至少一个论文作者名称,所述第二地址包括论文作者所属机构的地址。In a possible implementation manner, the second name information includes at least one name of a paper author, and the second address includes an address of an institution to which the paper author belongs.
图7示出根据本公开实施例的一种电子设备700的示意图。例如,电子设备700可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。FIG. 7 shows a schematic diagram of an electronic device 700 according to an embodiment of the present disclosure. For example, the electronic device 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
参照图7,电子设备700可以包括以下一个或多个组件:处理组件702,存储器704,电源组件706,多媒体组件708,音频组件710,输入/输出(I/O)接口712,传感器组件714,以及通信组件716。Referring to FIG. 7, electronic device 700 may include one or more of the following components: processing component 702, memory 704, power supply component 706, multimedia component 708, audio component 710, input/output (I/O) interface 712, sensor component 714, and communication component 716 .
处理组件702通常控制电子设备700的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件702可以包括一个或多个处理器720来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件702可以包括一个或多个模块,便于处理组件702和其他组件之间的交互。例如,处理组件702可以包括多媒体模块,以方便多媒体组件708和处理组件702之间的交互。The processing component 702 generally controls the overall operations of the electronic device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 702 may include one or more modules that facilitate interaction between processing component 702 and other components. For example, processing component 702 may include a multimedia module to facilitate interaction between multimedia component 708 and processing component 702 .
存储器704被配置为存储各种类型的数据以支持在电子设备700的操作。这些数据的示例包括用于在电子设备700上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器704可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 704 is configured to store various types of data to support operations at the electronic device 700 . Examples of such data include instructions for any application or method operating on the electronic device 700, contact data, phonebook data, messages, pictures, videos, and the like. The memory 704 can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
电源组件706为电子设备700的各种组件提供电力。电源组件706可以包括电源管理系统,一个或多个电源,及其他与为电子设备700生成、管理和分配电力相关联的组件。The power supply component 706 provides power to various components of the electronic device 700 . Power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 700 .
多媒体组件708包括在所述电子设备700和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件708包括一个前置摄像头和/或后置摄像头。当电子设备700处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 708 includes a screen providing an output interface between the electronic device 700 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 708 includes a front camera and/or a rear camera. When the electronic device 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive part of the multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
音频组件710被配置为输出和/或输入音频信号。例如,音频组件710包括一个麦 克风(MIC),当电子设备700处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器704或经由通信组件716发送。在一些实施例中,音频组件710还包括一个扬声器,用于输出音频信号。The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 700 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 704 or sent via communication component 716 . In some embodiments, the audio component 710 also includes a speaker for outputting audio signals.
I/O接口712为处理组件702和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 712 provides an interface between the processing component 702 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
传感器组件714包括一个或多个传感器,用于为电子设备700提供各个方面的状态评估。例如,传感器组件714可以检测到电子设备700的打开/关闭状态,组件的相对定位,例如所述组件为电子设备700的显示器和小键盘,传感器组件714还可以检测电子设备700或电子设备700一个组件的位置改变,用户与电子设备700接触的存在或不存在,电子设备700方位或加速/减速和电子设备700的温度变化。传感器组件714可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件714还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件714还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 714 includes one or more sensors for providing status assessments of various aspects of electronic device 700 . For example, the sensor component 714 can detect the open/closed state of the electronic device 700, the relative positioning of components, such as the display and the keypad of the electronic device 700, the sensor component 714 can also detect the electronic device 700 or one of the electronic device 700 The position of components changes, the presence or absence of user contact with the electronic device 700 , the orientation or acceleration/deceleration of the electronic device 700 and the temperature change of the electronic device 700 . Sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 714 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
通信组件716被配置为便于电子设备700和其他设备之间有线或无线方式的通信。电子设备700可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件716经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件716还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据通讯(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 716 is configured to facilitate wired or wireless communication between the electronic device 700 and other devices. The electronic device 700 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Communication (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备700可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, electronic device 700 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器704,上述计算机程序指令可由电子设备700的处理器720执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 704 including computer program instructions, which can be executed by the processor 720 of the electronic device 700 to implement the above method.
图8示出根据本公开实施例的另一种电子设备800的示意图。例如,电子设备800可以被提供为一服务器。参照图8,电子设备800包括处理组件822,其进一步包括一个或多个处理器,以及由存储器832所代表的存储器资源,用于存储可由处理组件822的执行的指令,例如应用程序。存储器832中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件822被配置为执行指令,以执行上述方法。FIG. 8 shows a schematic diagram of another electronic device 800 according to an embodiment of the present disclosure. For example, the electronic device 800 may be provided as a server. Referring to FIG. 8 , electronic device 800 includes processing component 822 , which further includes one or more processors, and a memory resource represented by memory 832 for storing instructions executable by processing component 822 , such as application programs. The application program stored in memory 832 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 822 is configured to execute instructions to perform the above method.
电子设备800还可以包括一个电源组件826被配置为执行电子设备800的电源管理,一个有线或无线网络接口850被配置为将电子设备800连接到网络,和一个输入输出(I/O)接口858。电子设备800可以操作基于存储在存储器832的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。 Electronic device 800 may also include a power supply component 826 configured to perform power management of electronic device 800, a wired or wireless network interface 850 configured to connect electronic device 800 to a network, and an input-output (I/O) interface 858 . The electronic device 800 can operate based on an operating system stored in the memory 832, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器832,上述计算机程序指令可由电子设备800的处理组件822执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as a memory 832 including computer program instructions, which can be executed by the processing component 822 of the electronic device 800 to implement the above method.
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存 储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以 基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (20)

  1. 一种相关文章搜索方法,包括:A search method for related articles, including:
    确定包括属性信息的第一文章,所述属性信息中至少包括第一人名信息和第一地址信息;determining the first article including attribute information, the attribute information including at least first person name information and first address information;
    根据所述属性信息确定所述第一文章对应的搜索信息;determining search information corresponding to the first article according to the attribute information;
    根据所述搜索信息在文章集合中搜索第一预设数量个候选文章,各所述候选文章中包括对应的第二人名信息和第二地址信息;Searching for a first preset number of candidate articles in the article collection according to the search information, each of the candidate articles includes corresponding second name information and second address information;
    根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值;determining matching values of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information;
    根据对应的匹配值在各所述候选文章中确定第二文章。A second article is determined among each of the candidate articles according to the corresponding matching value.
  2. 根据权利要求1所述的方法,其中,所述根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值包括:The method according to claim 1, wherein, according to the first name information and first address information and each of the second name information and second address information, determining the first article and each of the Matching values for candidate articles include:
    根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度;determining a first similarity between each of the candidate articles and the first article according to the first person name information and each of the second person name information;
    根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度;determining a second degree of similarity between each of the candidate articles and the first article according to the first address information and each of the second address information;
    根据各所述候选文章对应的第一相似度和第二相似度确定匹配值。A matching value is determined according to the first similarity and the second similarity corresponding to each candidate article.
  3. 根据权利要求2所述的方法,其中,所述根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与第一文章的第一相似度包括:The method according to claim 2, wherein, according to the first person name information and each of the second person name information, determining the first similarity between each of the candidate articles and the first article comprises:
    确定所述第一人名信息中包括的至少一个第一人名,和各所述第二人名信息中包括的至少一个第二人名;determining at least one first person's name included in the first person's name information, and at least one second person's name included in each of the second person's name information;
    对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值;For each of the first person names, determine at least one corresponding synonym information, and a related score corresponding to each of the synonym information;
    对于各所述第二人名信息,确定其中包括的各所述第二人名与各所述同义词信息的关系;For each of the second person name information, determine the relationship between each of the second person names included therein and each of the synonym information;
    根据与各所述第二人名对应的同义词信息的相关分值的和确定第一相似度。The first similarity is determined according to the sum of correlation scores of synonym information corresponding to each of the second person names.
  4. 根据权利要求3所述的方法,其中,所述对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值包括:The method according to claim 3, wherein, for each of the first person names, determining the corresponding at least one synonym information, and the relevant scores corresponding to each of the synonym information include:
    确定各所述第一人名对应的至少一个同义词信息;determining at least one synonym information corresponding to each of the first person names;
    通过对比各所述同义词信息与对应第一人名的差异确定所述同义词信息对应的相关分值。The correlation score corresponding to the synonym information is determined by comparing the difference between each synonym information and the corresponding first person name.
  5. 根据权利要求4所述的方法,其中,所述确定各所述第一人名对应的至少一个同义词信息包括:The method according to claim 4, wherein said determining at least one synonym information corresponding to each of said first person names comprises:
    响应于所述第一人名为英文人名,确定所述英文人名包括的英文姓和英文名;Responsive to the first person being an English name, determining an English surname and an English name included in the English name;
    将各所述第一人名对应的英文姓和英文名分别输入预设的多个姓名格式模板,得到所述对应的同义词信息。The English surname and English name corresponding to each first person's name are respectively input into a plurality of preset name format templates to obtain the corresponding synonym information.
  6. 根据权利要求5所述的方法,其中,所述确定各所述第一人名对应的至少一个同义词信息还包括:The method according to claim 5, wherein said determining at least one synonym information corresponding to each said first person name further comprises:
    响应于所述第一人名为中文人名,对所述中文人名进行文字转换得到至少一个对应的同义词信息;In response to the first person name being a Chinese name, performing text conversion on the Chinese name to obtain at least one corresponding synonym information;
    或者,确定所述中文人名包括的中文姓和中文名;Alternatively, determine the Chinese surname and Chinese name included in the Chinese name;
    通过拼音转换的方式将所述中文姓和中文名转换为英文姓和英文名。The Chinese surname and Chinese name are converted into English surname and English name by way of pinyin conversion.
  7. 根据权利要求4至6中任意一项所述的方法,其中,所述通过对比各所述同义词信息与对应第一人名的差异确定相关分值包括:The method according to any one of claims 4 to 6, wherein said determining the relevant score by comparing the difference between each said synonym information and the corresponding first person name comprises:
    确定各所述第一人名对应的第一分值;determining the first score corresponding to each of the first names;
    确定多种预设的差异分别对应的第二分值;Determining second scores corresponding to multiple preset differences;
    响应于所述同义词信息与对应的第一人名存在至少一种差异,确定所述第一人名对应的第一分值与各所述差异对应的第二分值的和的差为所述同义词信息对应的相关分值。In response to at least one difference between the synonym information and the corresponding first person name, determining that the difference between the first score corresponding to the first person name and the sum of the second scores corresponding to each of the differences is the The correlation score corresponding to the synonym information.
  8. 根据权利要求2至6中任意一项所述的方法,其中,所述根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与第一文章的第二相似度包括:The method according to any one of claims 2 to 6, wherein, according to the first address information and each of the second address information, the second similarity between each of the candidate articles and the first article is determined include:
    计算所述第一地址信息和各所述第二地址信息的编辑距离,得到各所述候选文章与所述第一文章的所述第二相似度。Calculate the edit distance between the first address information and each of the second address information to obtain the second similarity between each of the candidate articles and the first article.
  9. 根据权利要求1至8任一项所述的方法,其中,所述根据对应的匹配值在各所述候选文章中确定第二文章包括:The method according to any one of claims 1 to 8, wherein said determining the second article in each of the candidate articles according to the corresponding matching value comprises:
    确定对应匹配值最大的第二预设数量个候选文章为第二文章。A second preset number of candidate articles corresponding to the largest matching value is determined as the second article.
  10. 根据权利要求1至9任一项所述的方法,其中,所述根据对应的匹配值在各所述候选文章中确定第二文章包括:The method according to any one of claims 1 to 9, wherein said determining the second article in each of said candidate articles according to the corresponding matching value comprises:
    确定所述对应的匹配值大于匹配阈值的候选文章为所述第二文章。It is determined that the candidate article whose corresponding matching value is greater than a matching threshold is the second article.
  11. 根据权利要求10所述的方法,其中,所述属性信息还包括文章属性,所述根 据所述属性信息确定所述第一文章对应的搜索信息包括:The method according to claim 10, wherein the attribute information also includes article attributes, and determining the search information corresponding to the first article according to the attribute information includes:
    在所述文章属性中提取特征信息,所述特征信息包括所述第一文章对应的至少一个关键词;Extract feature information from the article attributes, where the feature information includes at least one keyword corresponding to the first article;
    根据所述特征信息、所述第一人名信息和所述第一地址信息确定所述第一文章对应的搜索信息。The search information corresponding to the first article is determined according to the characteristic information, the first name information and the first address information.
  12. 根据权利要求11所述的方法,其中,所述第一文章为专利文献,所述第二文章为学术论文。The method according to claim 11, wherein the first article is a patent document, and the second article is an academic paper.
  13. 根据权利要求11或12所述的方法,其中,所述文章属性包括说明书摘要、权利要求、说明书、专利名称、技术领域、背景技术、发明内容以及具体实施方式中的至少一种,所述第一人名信息包括至少一个发明人名称,所述第一地址包括申请人地址。The method according to claim 11 or 12, wherein the article attributes include at least one of specification abstract, claims, specification, patent title, technical field, background technology, content of the invention, and specific implementation, and the first The name information includes at least one inventor name, and the first address includes the address of the applicant.
  14. 根据权利要求13所述的方法,其中,所述第二人名信息包括至少一个论文作者名称,所述第二地址包括所述论文作者所属机构的地址。The method according to claim 13, wherein the second name information includes the name of at least one author of the thesis, and the second address includes the address of the institution to which the author of the thesis belongs.
  15. 一种相关文章搜索装置,包括:A device for searching related articles, comprising:
    第一文章确定模块,用于确定包括属性信息的第一文章,所述属性信息至少包括第一人名信息和第一地址信息;The first article determination module is configured to determine a first article including attribute information, the attribute information including at least first name information and first address information;
    搜索信息确定模块,用于根据所述属性信息确定所述第一文章对应的搜索信息;a search information determination module, configured to determine search information corresponding to the first article according to the attribute information;
    文章搜索模块,用于根据所述搜索信息在文章集合中搜索第一预设数量个候选文章,各所述候选文章中包括对应的第二人名信息和第二地址信息;An article search module, configured to search for a first preset number of candidate articles in the article collection according to the search information, and each of the candidate articles includes corresponding second name information and second address information;
    数据匹配模块,用于根据所述第一人名信息和第一地址信息与各所述第二人名信息和第二地址信息,确定所述第一文章和各所述候选文章的匹配值;A data matching module, configured to determine the matching value of the first article and each of the candidate articles according to the first name information and first address information and each of the second name information and second address information;
    第二文章确定模块,用于根据各所述候选文章的匹配值在各所述候选文章中确定第二文章。The second article determining module is configured to determine a second article in each of the candidate articles according to the matching value of each of the candidate articles.
  16. 根据权利要求15所述的相关文章搜索装置,其中,所述数据匹配模块包括:The related article search device according to claim 15, wherein said data matching module comprises:
    第一相似度确定子模块,用于根据所述第一人名信息和各所述第二人名信息,确定各所述候选文章与所述第一文章的第一相似度;The first similarity determining submodule is used to determine the first similarity between each of the candidate articles and the first article according to the first name information and each of the second name information;
    第二相似度确定子模块,用于根据所述第一地址信息和各所述第二地址信息,确定各所述候选文章与所述第一文章的第二相似度;A second similarity determining submodule, configured to determine a second similarity between each of the candidate articles and the first article according to the first address information and each of the second address information;
    匹配值确定子模块,用于根据各所述候选文章对应的所述第一相似度和所述第二相似度确定匹配值。A matching value determination submodule is configured to determine a matching value according to the first similarity and the second similarity corresponding to each of the candidate articles.
  17. 根据权利要求16所述的相关文章搜索装置,其中,所述第一相似度确定子模块包括:The related article search device according to claim 16, wherein said first similarity determining submodule comprises:
    人名确定单元,用于确定所述第一人名信息中包括的至少一个第一人名,和各所述第二人名信息中包括的至少一个第二人名;a name determination unit, configured to determine at least one first name included in the first name information, and at least one second name included in each of the second name information;
    同义词确定单元,用于对于各所述第一人名,确定对应的至少一个同义词信息,以及各所述同义词信息对应的相关分值;A synonym determining unit, configured to determine at least one corresponding synonym information for each of the first person names, and a related score corresponding to each of the synonym information;
    人名关联单元,用于对于各所述第二人名信息,确定其中包括的各所述第二人名与各所述同义词信息的关系;A name association unit, configured to determine the relationship between each of the second names and each of the synonym information included in each of the second name information;
    第一相似度确定单元,用于根据与各所述第二人名对应的同义词信息的相关分值的和确定第一相似度。The first similarity determination unit is configured to determine the first similarity according to the sum of correlation scores of the synonym information corresponding to each of the second names.
  18. 根据权利要求16或17所述的相关文章搜索装置,其中,所述第二相似度确定子模块包括:The related article search device according to claim 16 or 17, wherein the second similarity determining submodule comprises:
    第二相似度确定单元,用于计算所述第一地址信息和各所述第二地址信息的编辑距离,得到各所述候选文章与第一文章的第二相似度。The second similarity determining unit is configured to calculate an edit distance between the first address information and each of the second address information to obtain a second similarity between each of the candidate articles and the first article.
  19. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至14中任意一项所述的方法。Wherein, the processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1-14.
  20. 一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述计算机程序指令被处理器执行时实现权利要求1至14中任意一项所述的方法。A computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method according to any one of claims 1 to 14 when executed by a processor.
PCT/CN2022/129963 2021-11-04 2022-11-04 Related article search method and apparatus, electronic device, and storage medium WO2023078414A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111301509.6 2021-11-04
CN202111301509.6A CN113987128A (en) 2021-11-04 2021-11-04 Related article searching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023078414A1 true WO2023078414A1 (en) 2023-05-11

Family

ID=79746501

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129963 WO2023078414A1 (en) 2021-11-04 2022-11-04 Related article search method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN113987128A (en)
WO (1) WO2023078414A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987128A (en) * 2021-11-04 2022-01-28 智慧芽信息科技(苏州)有限公司 Related article searching method and device, electronic equipment and storage medium
CN114969391B (en) * 2022-07-29 2022-11-18 华中科技大学同济医学院附属协和医院 Article data searching method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932320A (en) * 2018-06-27 2018-12-04 广州优视网络科技有限公司 Article search method, apparatus and electronic equipment
CN109670014A (en) * 2018-11-21 2019-04-23 北京大学 A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning
CN109918670A (en) * 2019-03-12 2019-06-21 重庆誉存大数据科技有限公司 A kind of article duplicate checking method and system
CN112613310A (en) * 2021-01-04 2021-04-06 成都颜创启新信息技术有限公司 Name matching method and device, electronic equipment and storage medium
CN113535952A (en) * 2021-07-13 2021-10-22 六棱镜(杭州)科技有限公司 Intelligent matching data processing method based on artificial intelligence
CN113987128A (en) * 2021-11-04 2022-01-28 智慧芽信息科技(苏州)有限公司 Related article searching method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932320A (en) * 2018-06-27 2018-12-04 广州优视网络科技有限公司 Article search method, apparatus and electronic equipment
CN109670014A (en) * 2018-11-21 2019-04-23 北京大学 A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning
CN109918670A (en) * 2019-03-12 2019-06-21 重庆誉存大数据科技有限公司 A kind of article duplicate checking method and system
CN112613310A (en) * 2021-01-04 2021-04-06 成都颜创启新信息技术有限公司 Name matching method and device, electronic equipment and storage medium
CN113535952A (en) * 2021-07-13 2021-10-22 六棱镜(杭州)科技有限公司 Intelligent matching data processing method based on artificial intelligence
CN113987128A (en) * 2021-11-04 2022-01-28 智慧芽信息科技(苏州)有限公司 Related article searching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113987128A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2020029966A1 (en) Method and device for video processing, electronic device, and storage medium
WO2023078414A1 (en) Related article search method and apparatus, electronic device, and storage medium
US8244284B2 (en) Mobile communication device and the operating method thereof
RU2589873C2 (en) Input processing method and apparatus
WO2021128880A1 (en) Speech recognition method, device, and device for speech recognition
CN107305438B (en) Method and device for sorting candidate items
CN107092424B (en) Display method and device of error correction items and device for displaying error correction items
WO2019109663A1 (en) Cross-language search method and apparatus, and apparatus for cross-language search
CN108304412B (en) Cross-language search method and device for cross-language search
CN110019675B (en) Keyword extraction method and device
WO2023061276A1 (en) Data recommendation method and apparatus, electronic device, and storage medium
CN111538830B (en) French searching method, device, computer equipment and storage medium
WO2021082463A1 (en) Data processing method and apparatus, electronic device and storage medium
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN107797676B (en) Single character input method and device
CN107291260B (en) Information input method and device for inputting information
CN109783244B (en) Processing method and device for processing
WO2023092975A1 (en) Image processing method and apparatus, electronic device, storage medium, and computer program product
CN106959970B (en) Word bank, processing method and device of word bank and device for processing word bank
JP7208968B2 (en) Information processing method, device and storage medium
WO2021082461A1 (en) Storage and reading method and apparatus, electronic device, and storage medium
CN109558017B (en) Input method and device and electronic equipment
CN110929122B (en) Data processing method and device for data processing
CN108227952B (en) Method and system for generating custom word and device for generating custom word
CN109426359B (en) Input method, device and machine readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889435

Country of ref document: EP

Kind code of ref document: A1