CN114970543A - A Semantic Analysis Method for Crowdsourcing Design Resources - Google Patents

A Semantic Analysis Method for Crowdsourcing Design Resources Download PDF

Info

Publication number
CN114970543A
CN114970543A CN202210543747.6A CN202210543747A CN114970543A CN 114970543 A CN114970543 A CN 114970543A CN 202210543747 A CN202210543747 A CN 202210543747A CN 114970543 A CN114970543 A CN 114970543A
Authority
CN
China
Prior art keywords
independent
relationship
predicate
semantics
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210543747.6A
Other languages
Chinese (zh)
Other versions
CN114970543B (en
Inventor
于树松
郭保琪
刘晓菲
石硕
丁香乾
杨宁
刘国敬
牛迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202210543747.6A priority Critical patent/CN114970543B/en
Publication of CN114970543A publication Critical patent/CN114970543A/en
Application granted granted Critical
Publication of CN114970543B publication Critical patent/CN114970543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种众包设计资源的语义分析方法,包括:步骤1、将众包设计资源短句进行词分隔和词性标注;步骤2、将步骤1处理后的短句切分为多个独立短句;步骤3、针对每个独立短句进行以下处理:依存关系分析、以并列关系和独立结构提取独立短句中的独立功能成分,针对每个独立功能成分构建多级语义模型;本发明将众包设计资源短句分词、进行词性标注、划分独立短句、针对每个独立短句划分独立功能成分、针对每个独立功能成分构建多级语义模型,从而将非结构化的自然语言短句描述转换为结构化的关系集合,实现了众包设计资源的统一建模,对后续的检索、匹配都具有重要的意义。

Figure 202210543747

The invention discloses a semantic analysis method for crowdsourced design resources, comprising: step 1, performing word separation and part-of-speech tagging on short sentences of crowdsourced design resources; step 2, dividing the short sentences processed in step 1 into multiple independent short sentences; step 3, carry out the following processing for each independent short sentence: analyze the dependency relationship, extract the independent functional components in the independent short sentences with the parallel relationship and independent structure, and build a multi-level semantic model for each independent functional component; The invention combines crowdsourcing design resource short sentence segmentation, part-of-speech tagging, dividing independent short sentences, dividing independent functional components for each independent short sentence, and constructing a multi-level semantic model for each independent functional component, so as to convert unstructured natural language. The short sentence description is converted into a structured relational set, which realizes the unified modeling of crowdsourcing design resources, and is of great significance for subsequent retrieval and matching.

Figure 202210543747

Description

一种众包设计资源的语义分析方法A Semantic Analysis Method for Crowdsourcing Design Resources

技术领域technical field

本发明属于计算机数据处理技术领域,具体地说,是涉及一种众包设计资源的语义分析方法。The invention belongs to the technical field of computer data processing, and in particular relates to a semantic analysis method for crowdsourcing design resources.

背景技术Background technique

众包设计资源在互联网众包平台上大多以文字、数字、图像数据混合表达,而众包服务自组织的特点使得设计资源之间的相关内容存在很大差异,要管理好设计资源,需要对其所涉及的相关服务内容进行结构化处理。Crowdsourcing design resources are mostly expressed in a mixture of text, numbers, and image data on Internet crowdsourcing platforms, and the self-organization of crowdsourcing services makes the related content between design resources very different. To manage design resources well, it is necessary to The related service content involved is structured.

发明内容SUMMARY OF THE INVENTION

本发明提出一种众包服务资源的语义分析方法,以互联网众包平台上的设计资源为研究对象,通过短语划分和依存关系分析,将众包设计资源划分为独立功能成分,针对独立功能成分进行多级语义表示,从而将非结构化的自然语言短句描述转换为结构化的关系集合,以实现众包设计资源的统一建模。The present invention proposes a semantic analysis method for crowdsourcing service resources, which takes the design resources on the Internet crowdsourcing platform as the research object, and divides the crowdsourcing design resources into independent functional components through phrase division and dependency analysis. Multi-level semantic representation is performed to convert unstructured natural language short sentence descriptions into structured relational sets for unified modeling of crowdsourced design resources.

本发明采用以下技术方案予以实现:The present invention adopts following technical scheme to realize:

提出一种众包设计资源的语义分析方法,包括:A semantic analysis method for crowdsourced design resources is proposed, including:

步骤1、将众包设计资源短句进行词分隔和词性标注;Step 1. Perform word separation and part-of-speech tagging on short sentences of crowdsourced design resources;

步骤2、将步骤1处理后的短句切分为多个独立短句;Step 2. Divide the short sentences processed in step 1 into multiple independent short sentences;

步骤3、针对每个独立短句进行以下处理:Step 3. Perform the following processing for each independent short sentence:

1)依存关系分析;1) Dependency analysis;

2)以并列关系和独立结构提取独立短句中的独立功能成分;2) Extract independent functional components in independent short sentences with juxtaposition and independent structure;

3)针对每个独立功能成分构建多级语义模型:3) Build a multi-level semantic model for each independent functional component:

(1)遍历独立功能成分找到核心动谓词,以检测到的广义动宾关系得到一级语义;所述广义动宾关系包括直接宾语关系、间接宾语关系、前置宾语关系、核心动谓词与其支配的名词短语为状中关系、核心动谓词与其支配的名词短语为主谓关系、当核心动谓词的宾语缺失时以定中关系表达的修饰词;(1) Traverse independent functional components to find the core verb predicate, and obtain the first-level semantics with the detected generalized verb-object relationship; the generalized verb-object relationship includes direct object relationship, indirect object relationship, prepositional object relationship, core verb predicate and its domination The noun phrases are adjective-centered, the core verb-predicate and the noun-phrase it dominates are subject-predicate, and when the object of the core verb-predicate is missing, it is a modifier expressed with a fixed-center relationship;

(2)对一级语义中的谓词中心词和宾语中心词检测其修饰成分得到二级语义;(2) Detect the modifier components of the predicate center word and the object center word in the first-level semantics to obtain the second-level semantics;

(3)对二级语义的中心词检测其修饰成分得到三级语义;(3) Detecting the modified components of the center word of the second-level semantics to obtain the third-level semantics;

(4)将检测到的高于三级的语义成分补全到其隶属的中心词上。(4) Complete the detected semantic components higher than level 3 to the head word they belong to.

进一步的,在步骤2中,通过空格、顿号、逗号和正反斜杆将短句切分为多个独立短句。Further, in step 2, the short sentence is divided into multiple independent short sentences by spaces, commas, commas and forward and backward slashes.

进一步的,以并列关系和独立结构关系提取独立短句中的独立功能成分,包括:Further, the independent functional components in the independent short sentences are extracted by the juxtaposition relationship and the independent structural relationship, including:

(1)当检测到的并列关系位于广义宾语修饰区域时,以核心动谓词为基础同等划分为两个独立功能成分;(1) When the detected juxtaposition is located in the generalized object modification area, it is equally divided into two independent functional components based on the core verb predicate;

(2)当检测到的并列关系位于核心动谓词之间时,同等划分为两个独立功能成分;(2) When the detected juxtaposition is between the core verb predicates, it is equally divided into two independent functional components;

(3)当并列关系和独立结构同时存在时,若独立结构是动词性谓语且存在并列关系时,以所述独立结构的动词性谓语为基础同等划分为两个独立功能成分;若独立结构非动词性谓语时,不建立独立功能成分。(3) When a parallel relationship and an independent structure coexist, if the independent structure is a verb predicate and there is a parallel relationship, it is equally divided into two independent functional components based on the verb predicate of the independent structure; if the independent structure is not When it is a verb predicate, no independent functional component is established.

进一步的,针对每个独立功能成分构建多级语义模型,包括:Further, a multi-level semantic model is constructed for each independent functional component, including:

当检测到一级语义不存在核心宾语时,将其修饰部分视为其宾语,构建到一级语义中。When it is detected that there is no core object in the first-level semantics, its modified part is regarded as its object and built into the first-level semantics.

与现有技术相比,本发明的优点和积极效果是:本发明提出的众包设计资源的语义分析方法中,将众包设计资源短句分词、进行词性标注、划分独立短句、针对每个独立短句划分独立功能成分、针对每个独立功能成分构建多级语义模型,从而将非结构化的自然语言短句描述转换为结构化的关系集合,实现了众包设计资源的统一建模,对后续的检索、匹配都具有重要的意义。Compared with the prior art, the advantages and positive effects of the present invention are: in the method for semantic analysis of crowdsourced design resources proposed by the present invention, the crowdsourced design resource short sentences are segmented into words, part-of-speech tagging is performed, independent short sentences are divided, and each short sentence is divided into separate sentences. Each independent short sentence is divided into independent functional components, and a multi-level semantic model is constructed for each independent functional component, so as to convert the unstructured natural language short sentence description into a structured relational set, and realize the unified modeling of crowdsourcing design resources. , which is of great significance to subsequent retrieval and matching.

结合附图阅读本发明实施方式的详细描述后,本发明的其他特点和优点将变得更加清楚。Other features and advantages of the present invention will become more apparent upon reading the detailed description of the embodiments of the present invention in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明提出的众包设计资源的语义分析方法的分析步骤示意图;1 is a schematic diagram of analysis steps of a semantic analysis method for crowdsourcing design resources proposed by the present invention;

图2为本发明中划分独立短句的划分示意图之一;Fig. 2 is one of the division schematic diagrams of dividing independent short sentences in the present invention;

图3为本发明中划分独立短句的划分示意图之二;Fig. 3 is the second schematic diagram of dividing independent short sentences in the present invention;

图4为本发明中基于依存关系分析树划分独立短句的划分示意图之三;4 is the third schematic diagram of the division of independent short sentences based on the dependency analysis tree in the present invention;

图5为本发明中基于依存关系分析树划分独立功能成分的划分示意图之一;5 is one of the schematic diagrams of the division of independent functional components based on a dependency analysis tree in the present invention;

图6为本发明中基于依存关系分析树划分独立功能成分的划分示意图之二;6 is the second schematic diagram of the division of independent functional components based on the dependency analysis tree in the present invention;

图7为本发明中基于依存关系分析树划分独立功能成分的划分示意图之三;7 is the third schematic diagram of the division of independent functional components based on the dependency analysis tree in the present invention;

图8为本发明中构建多级语义模型的构建步骤示意;8 is a schematic diagram of the construction steps for constructing a multi-level semantic model in the present invention;

图9为本发明中构建多级语义模型的示意图之一;9 is one of the schematic diagrams of constructing a multi-level semantic model in the present invention;

图10为为本发明中构建多级语义模型的示意图之二;10 is the second schematic diagram of constructing a multi-level semantic model in the present invention;

图11为本发明中构建多级语义模型的示意图之三;11 is the third schematic diagram of constructing a multi-level semantic model in the present invention;

图12为本发明中构建多级语义模型的示意图之四。FIG. 12 is the fourth schematic diagram of constructing a multi-level semantic model in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在本发明的描述中,需要理解的是,术语“中心”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the terms "center", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", The orientation or positional relationship indicated by "top", "bottom", "inner", "outer", etc. is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying The device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the invention.

在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。在上述实施方式的描述中,具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of the present invention, it should be noted that the terms "installed", "connected" and "connected" should be understood in a broad sense, unless otherwise expressly specified and limited, for example, it may be a fixed connection or a detachable connection connected, or integrally connected. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations. In the foregoing description of the embodiments, the particular features, structures, materials or characteristics may be combined in any suitable manner in any one or more of the embodiments or examples.

术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。The terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the present invention, unless otherwise specified, "plurality" means two or more.

本发明提出的众包设计资源的语义分析方法,如图1所示,包括:The semantic analysis method for crowdsourcing design resources proposed by the present invention, as shown in Figure 1, includes:

步骤S1:将众包设计资源短句进行词分隔和词性标注。Step S1: Perform word separation and part-of-speech tagging on short sentences of crowdsourced design resources.

在汉语中字是词的基本单位,要理解一个短语、一句话的含义要以词为划分,这就要计算机在处理一个文本时首先要对句子进行划词,即自动识别出每一个词并在其中加入边界标记符来分隔词汇。In Chinese, a character is the basic unit of a word. To understand the meaning of a phrase and a sentence, it must be divided into words. This requires the computer to firstly mark the sentence when processing a text, that is, automatically identify each word and put it in the Boundary markers are added to separate words.

在实际工程中,常用的方法使用规则算法进行分词,再用统计法加以辅助,这样可以高效准确地将文本分词,也可以兼顾新词和未录入词汇的识别,这就是混合分词法。In practical engineering, the commonly used method uses rule algorithm for word segmentation, and then uses statistical method to assist, which can efficiently and accurately segment the text, and can also take into account the recognition of new words and unentered words, which is the hybrid word segmentation method.

例如,对描述一个众包资源功能的短句"各种毛笔字体书写、设计"的分词结果为:[各种, 毛笔, 字体, 书写, 、, 设计]。For example, the word segmentation result for the short sentence "writing and designing various brush fonts" describing the function of a crowdsourced resource is: [various, writing brush, font, writing, , , design].

分词之后需要对词性进行标注,词性标注是给句子中的每个词进行词性判定并加以标注,将每个词分为名词、动词、形容词以及助词等等。After participle, part of speech needs to be tagged. Part of speech tagging is to determine and tag each word in the sentence, and divide each word into nouns, verbs, adjectives, and auxiliary words.

在中文中,很多次的词性并不单一,在不同句子中表达了不同的词性,另一方面,多数词往往只有一两个词性,且两个(或多个)词性中有一个词性的使用频率远高于其他。中文词性标注尚无统一的标注标准,两种主流的标注为北大的词性标注集和宾州词性标注集。以下是标注词性的举例:In Chinese, many parts of speech are not single, and different parts of speech are expressed in different sentences. On the other hand, most words often only have one or two parts of speech, and one part of speech is used in two (or more) parts of speech frequency is much higher than others. There is no unified standard for Chinese part-of-speech tagging. The two mainstream tags are the Peking University POS tagging set and the Pennsylvania POS tagging set. The following are examples of tagging parts of speech:

“各种毛笔字体书写、设计”词性划分结果为:[各种/rz, 毛笔/n, 字体/n, 书写/v, 、/w, 设计/vn]。The result of the part-of-speech division of "various brush fonts to write and design" is: [various/rz, brush/n, font/n, writing/v, , /w, design/vn].

涉及到的词语的词类标记主要有以下几大类:1)名词(n);2)动词(v);3)标点(w);4)形容词(a);5)连词(c);6)代词(r);7)量词(q);8)其他(o)。更加详细的划分还有:动名词(vn);不及物动词(vi);食品(nf);指示代词(rz)等。The part-of-speech tags of the words involved mainly fall into the following categories: 1) noun (n); 2) verb (v); 3) punctuation (w); 4) adjective (a); 5) conjunction (c); 6 ) pronoun (r); 7) quantifier (q); 8) other (o). There are more detailed divisions: gerunds (vn); intransitive verbs (vi); food (nf); demonstrative pronouns (rz).

步骤S2:将步骤S1处理后的短句切分为多个独立短句。Step S2: Divide the short sentence processed in step S1 into a plurality of independent short sentences.

在众包网站的设计资源信息中,标点符号十分常见,例如“我可以设计标志 图纸产品外形等”、“红酒、食品、实物拍摄”、“首页设计/专业PS抠图/去水印/广告图设计/宝贝描述”,而这些标点符号所发挥的功能多为分割短句,因此本发明通过空格、顿号、逗号、正反斜杠等将短句切分为多个独立短句,没有标点符号划分的短句则无需划分。In the design resource information of crowdsourcing websites, punctuation marks are very common, such as "I can design logo drawings, product shapes, etc.", "red wine, food, physical photography", "homepage design/professional PS cutout/watermark removal/advertising image" Design/baby description", and the functions of these punctuation marks are mostly to divide short sentences, so the present invention divides short sentences into multiple independent short sentences by spaces, commas, commas, forward and backward slashes, etc., without punctuation Short sentences separated by symbols do not need to be divided.

例如:E.g:

1)用斜杠分割的短语:“网站建设/平面设计/百度排名/微博营销 用品质说话!”,分割结果参考图2所示。1) Phrases separated by slashes: "Website construction/graphic design/Baidu ranking/Weibo marketing speak with quality!", the segmentation result is shown in Figure 2.

2)用空格分割的短语:“食品 菜式甜点甜品 饮料饮品汽水拍照 静物产品拍摄拍照”,分割的结果参考图3所示。2) Phrases separated by spaces: "Food dishes, desserts, desserts, beverages, beverages, and sodas, and still life products." The result of segmentation is shown in Figure 3.

3)用顿号分割的短语:“承接各种宣传册、菜单排版设计”,分割结果如图4所示。3) Phrases separated by commas: "undertake various brochures, menu layout design", the segmentation result is shown in Figure 4.

步骤S3:针对每个独立短句进行以下处理:Step S3: Perform the following processing for each independent short sentence:

1、依存关系分析。1. Dependency analysis.

依存关系是词与词之间的关系,即一个中心词与其从属词之间的二元非对称关系,一个句子的中心词通常是动词(Verb),所有其他词依赖于中心词。Dependency is the relationship between words, that is, a binary asymmetric relationship between a head word and its subordinate words, the head word of a sentence is usually a verb (Verb), and all other words depend on the head word.

在众包资源中主要使用了以下几种句法依存关系:1)并列关系(COO);2)复合名词(FM);3)其他名词(QM);4)直接宾语(VOB);5)间接宾语(IOB);6)前置宾语(FOB);7)状中关系(ADV);8)主谓关系(SBV);9)定中关系(ATT);(10)独立结构(IC)。The following syntactic dependencies are mainly used in crowdsourced resources: 1) Collocation (COO); 2) Compound Nouns (FM); 3) Other Nouns (QM); 4) Direct Object (VOB); 5) Indirect Object (IOB); 6) Prepositional Object (FOB); 7) Adjective relationship (ADV); 8) Subject-predicate relationship (SBV); 9) Definitive relationship (ATT); (10) Independent structure (IC).

各种关系的定义如下:The various relationships are defined as follows:

并列关系(COO)表示“和、与、或”连接的词关系,用于其并列名词或动词的下一步处理,如,“详情页和首页设计”:{(详情页/n– >首页/n,COO)。Co-ordinate relationship (COO) represents the word relationship connected with "and, and, or", which is used for the next processing of its coordinating noun or verb, such as, "detail page and home page design": {(detail page /n->homepage/ n, COO).

复合名词(FM)表示名词修饰名词的关系,用于拆分后做宾语描述动词的作用对象,如,“菜式甜点甜品”:{(菜式/n– >甜点/n,FM);(甜点/n– >甜品/n,FM)。Compound nouns (FM) represent the relationship between nouns and nouns, and are used to describe the object of the verb after splitting. Dessert/n -> Dessert/n, FM).

其他名词(QM)表示除名词外的其他词性修饰名词的关系,描述了名词的属性,用于细分类,如,“小程序开发”:{(小/a– >程序/n,QM)。Other nouns (QM) represent the relationship of other part-of-speech modified nouns other than nouns, describe the attributes of nouns, and are used for sub-category, such as, "small program development": {(small/a->program/n, QM).

直接宾语(VOB)表示谓语动词后接的直接宾语,直接宾语和前置宾语都会归类为动宾结构,是众包语料中依赖关系的基本关系,如,“送她一束花”:{(送/v–>花/n,VOB)}。The direct object (VOB) represents the direct object followed by the predicate verb. Both the direct object and the prepositional object are classified into a verb-object structure, which is the basic relationship of dependencies in crowdsourced corpus, for example, "send her a bunch of flowers": { (send/v–>flowers/n, VOB)}.

前置宾语(FOB)表示谓语动词前的宾语,如,“小程序开发”:{(程序/n–开发/v,FOB)。The Prepositional Object (FOB) represents the object before the predicate verb, eg, "small program development": {(program/n – development/v, FOB).

间接宾语(IOB)表示谓语动词后的人称词,在众包语料中偶尔出现,如,“送她一束花”:{(送/v – > 她/n,IOB)},但此结构意义不大,不会在最终的依存关系中出现。Indirect Object (IOB) represents the personal pronoun after the predicate verb, which occurs occasionally in crowdsourced corpora, such as, "send her a bouquet of flowers": {(send/v -> her/n, IOB)}, but this structural meaning Not much, won't show up in the final dependencies.

主谓关系(SBV)表示句子中的施动者与动作的关系,如,“我可以帮你……”:{(我/r– >帮/n,SBV)}。The subject-verb relationship (SBV) represents the relationship between the agent and the action in the sentence, for example, "I can help you...": {(i/r –> help/n, SBV)}.

状中关系(ADV)表示动词修饰动词的关系,如,“我可以帮你……”:{(可以/v– >帮/v,ADV)}。An adjective relationship (ADV) represents a relationship in which a verb modifies a verb, for example, "I can help you...": {(can/v –> help/v, ADV)}.

定中关系(ATT)表示定语和中心语的关系,如,“系统安全”:{(系统/n – >安全/adj,ATT)}。Attributive relationship (ATT) represents the relationship between attributive and head, for example, "system security": {(system/n -> security/adj, ATT)}.

独立结构(IC),表示连接了两个独立的语义成分,如“网站开发网站建设“:{(网站开发– > 网站建设,IC)}。Independent Structure (IC), which means that two independent semantic components are connected, such as "Website Development Website Construction": {(Website Development -> Website Construction, IC)}.

2、以并列关系和独立结构提取提取独立短句中的独立功能成分。2. Extract independent functional components in independent short sentences with juxtaposition and independent structure extraction.

众包设计资源的功能描述可短可长,复杂的描述通常具有包含多个核心业务功能,也可称之为具有多个独立功能成分,每个成分之间没有任何标点符号分隔,例如:“渗透测试漏洞扫描网站安全检测入侵检测安全测试”,“手机商城网站建设网上购物在线支付购物网站开发”等等,这种描述显示是多项业务功能的集合,对这种描述本发明通过依存关系分析提取独立短句的独立功能成分。The functional description of crowdsourced design resources can be short or long, and complex descriptions usually include multiple core business functions, which can also be referred to as having multiple independent functional components without any punctuation between each component. For example: " Penetration testing, vulnerability scanning, website security detection, intrusion detection security testing”, “Mobile phone mall website construction, online shopping, online payment, and shopping website development”, etc. This description shows that it is a collection of multiple business functions. Analysis and extraction of independent functional components of independent short sentences.

具体的,基于依存关系分析按照并列关系(COO)、独立结构(IC)进行独立功能成分划分,由于COO、IC等关系所处的位置不同,会导致不同的语义关系,本发明按照以下三种情况来划分独立功能成分。Specifically, based on the analysis of the dependency relationship, the independent functional components are divided according to the parallel relationship (COO) and the independent structure (IC). Due to the different positions of the COO, IC and other relationships, different semantic relationships will result. The present invention is based on the following three situation to divide independent functional components.

(1)当检测到的并列关系(COO)位于广义宾语修饰区域时,以核心动谓词为基础同等划分为两个独立功能成分。(1) When the detected co-relationship (COO) is located in the generalized object modification region, it is equally divided into two independent functional components based on the core verb predicate.

以“微信和系统安全测试”为例,其依存关系分析树见图5,根据语义的理解,以核心动谓词“测试”为基础,将并列宾语“微信”和“系统”与核心动谓词“测试”同等划分为两个独立功能成分:Taking "WeChat and system security test" as an example, the dependency analysis tree is shown in Figure 5. According to the understanding of semantics, based on the core verb "test", the parallel objects "WeChat" and "system" are combined with the core verb "test". "Test" is equally divided into two separate functional components:

独立功能成分1:(其中多级语义的构建将在下面内容中详述)Independent functional component 1: (wherein the construction of multi-level semantics will be detailed in the following content)

一级语义:{(测试 , 安全,(ATT))}First-level semantics: {(test, safe, (att))}

二级语义:{(微信,安全,(ATT))}Secondary semantics: {(WeChat, Security, (ATT))}

独立功能成分2:Independent functional ingredient 2:

一级语义:{(测试 , 安全,(ATT))}First-level semantics: {(test, safe, (att))}

二级语义:{(系统,安全,(ATT))}Secondary semantics: {(system, security, (att))}

(2)当检测到的并列关系位于核心动谓词之间时,同等划分为两个独立功能成分。(2) When the detected juxtaposition is between the core verb predicates, it is equally divided into two independent functional components.

以“渗透测试漏洞扫描网站安全检测入侵检测安全测试”为例,这种描述冗长且不清晰,但依存关系分析树能够表达其中的语义逻辑关系,基于如图6所示的依存关系分析树能够检测到三个核心动谓词“测试1”、“检测”、和“测试2”,其中的测试1、2为同名区分:Taking "penetration testing vulnerability scanning website security detection intrusion detection security testing" as an example, this description is long and unclear, but the dependency analysis tree can express the semantic logical relationship. Based on the dependency analysis tree shown in Figure 6, it can Three core verb predicates "test 1", "test", and "test 2" were detected, among which tests 1 and 2 are distinguished by the same name:

两两核心动谓词构成并列关系,均同等划分为2个独立功能成分,在本实施例中即划分出三个独立功能成分:(其中多级语义的构建内容将在下面内容中详述)Pairs and two core verb predicates form a parallel relationship, and are equally divided into two independent functional components. In this embodiment, three independent functional components are divided: (The construction content of multi-level semantics will be described in detail in the following content)

独立功能成分1:Independent functional ingredient 1:

一级语义:{(测试1 , 渗透,(ATT)), (测试1 , 扫描,(ATT))}Level 1 Semantics: {(Test1, Penetration, (ATT)), (Test1, Scan, (ATT))}

二级语义:(漏洞,扫描,(ATT))Secondary semantics: (vulnerability, scan, (ATT))

独立功能成分2:Independent functional ingredient 2:

一级语义:{(检测 , 网站,(ATT), (检测 , 入侵,(ATT)),(检测 , 安全,(ATT))}Level 1 Semantics: {(Detect, Site, (ATT), (Detect, Intrusion, (ATT)), (Detect, Secure, (ATT))}

独立功能成分3:Independent functional ingredient 3:

一级语义:{(测试2 , 安全,(ATT))}First-level semantics: {(test2, safe, (att))}

(3)当并列关系和独立结构同时存在时,若独立结构是动词性谓语且存在并列关系时,以所述独立结构的动词性谓语为基础同等划分为两个独立功能成分;若独立结构非动词性谓语时,不建立独立功能成分。(3) When a parallel relationship and an independent structure coexist, if the independent structure is a verb predicate and there is a parallel relationship, it is equally divided into two independent functional components based on the verb predicate of the independent structure; if the independent structure is not When it is a verb predicate, no independent functional component is established.

以“上海公司企业社保开户专业注册记账报税代理经验丰富超值特惠”为例,基于如图7所示的依存分析关系树能够检测到一个核心动谓词“开户”和四个独立结构“注册”、“记账”、“代理”、“经验丰富”和“特惠”,其中,独立结构“注册”、“记账”和“代理”均为动词性谓语且两两为并列关系,则以“注册”、“记账”和“代理”分别为基础同等划分为三个独立结构,共划分出4个独立功能成份,其中,“经验丰富”和“特惠”为非动词性谓语,不建立独立功能成分:Taking "Shanghai company's corporate social security account opening professional registered accounting and tax declaration agent with rich experience and value-added special offer" as an example, based on the dependency analysis relationship tree shown in Figure 7, one core verb "open an account" and four independent structures "register" can be detected. ", "bookkeeping", "agent", "experienced" and "special offer", where the independent structures "registration", "bookkeeping" and "agent" are all verb predicates and the two are in a parallel relationship, then the "Registration", "Accounting" and "Agent" are equally divided into three independent structures based on the same basis, and a total of four independent functional components are divided. Independent functional ingredients:

独立功能成分1:Independent functional ingredient 1:

一级语义:{(开户, 社保,(SBV))}Level 1 semantics: {(open account, social security, (SBV))}

二级语义:{(企业,社保,(ATT))}Secondary semantics: {(Enterprise, Social Security, (ATT))}

三级语义:{(上海公司,企业,(ATT))}Level 3 semantics: {(Shanghai Company, Enterprise, (ATT))}

独立功能成分2:Independent functional ingredient 2:

一级语义:{(注册, ∅,(∅))}First-level semantics: {(register, ∅, (∅))}

二级语义:{(专业,注册,(ADV))}Secondary semantics: {(professional, registered, (ADV))}

独立功能成分3:Independent functional ingredient 3:

一级语义:{(记账, ∅,(∅))}First-level semantics: {(bookkeeping, ∅, (∅))}

独立功能成分4:Independent functional ingredient 4:

一级语义:{(代理, ∅,(∅))}First-level semantics: {(agent, ∅, (∅))}

二级语义:{(报税,代理,(ADV))}Secondary semantics: {(tax, agent, (adv))}

3、针对每个独立功能成分构建多级语义模型。3. Build a multi-level semantic model for each independent functional component.

本发明基于以下预定义构建多级语义模型:The present invention builds a multi-level semantic model based on the following predefined definitions:

一级语义以动词性谓语为核心词,名词性短语为从属词,通过直接宾语关系(VOB)、间接宾语关系(IOB)、前置宾语关系(FOB)等表达句子的基本语义,一般代表服务资源提供的一项主要功能。The first-level semantics takes the verb predicate as the core word, and the noun phrase as the subordinate word, expresses the basic semantics of the sentence through the direct object relationship (VOB), indirect object relationship (IOB), front object relationship (FOB), etc., generally representing service A primary function provided by the resource.

二级语义在一级语义的基础上,对通过状中关系(ADV)、动补关系(CMP)、介宾关系(POB)对各自中心词进行修饰补充说明。On the basis of the first-level semantics, the second-level semantics modifies and supplements the respective central words through the adjective-median relationship (ADV), the verb-complement relationship (CMP), and the prepositional-object relationship (POB).

三级语义在二级语义的基础上,对二级语义的中心词继续修饰、补充说明,使得语义更加丰富饱满。On the basis of the second-level semantics, the third-level semantics continues to modify and supplement the central words of the second-level semantics, making the semantics richer and fuller.

以短语表达的服务资源的自然语言,通过三级语义基本上能够表达资源的全部含义,高于三级的语法成分,可以通过截断或者补全到三级中心词处理。The natural language of the service resource expressed in phrases can basically express the full meaning of the resource through the third-level semantics, and the grammatical components higher than the third-level can be processed by truncation or completion to the third-level central word.

在构建一级语义时,考虑到依存关系分析的误差,很多从实际语义来看应该是动宾类的关系会被识别为定中(ATT)关系,因此对于一级语义,如果检测到核心动词性谓词与其支配名词短语是ATT关系时,也视作动宾关系。核心动词性谓词与其支配名词短语是主谓(SBV)关系,也视作动宾关系。当核心动词谓语的宾语缺失时,既可能是因为语言本身的表达,也可以是依存关系分析模型本身的误差导致,此种情况下,语义主要由核心谓语及其修饰部分表达,则本发明申请中,以定中(ATT)关系表达的修饰也视作宾语,位于一级的位置。When constructing first-level semantics, considering the error of dependency analysis, many relationships that should be verb-object classes from the actual semantics will be identified as attentive (ATT) relationships. Therefore, for first-level semantics, if a core verb is detected When the gender predicate and its dominant noun phrase are in the ATT relation, it is also regarded as an action-object relation. The core verb predicate and its dominant noun phrase are subject-verb (SBV) relationship, which is also regarded as verb-object relationship. When the object of the core verb predicate is missing, it may be caused by the expression of the language itself or the error of the dependency analysis model itself. In this case, the semantics is mainly expressed by the core predicate and its modified parts, and the present application In , the modifications expressed in the attentive (ATT) relationship are also regarded as objects, located in the first-level position.

因此,本发明申请在构建一级语义时,将直接宾语关系、间接宾语关系、前置宾语关系、核心动谓词与其支配的名词短语为状中关系、核心动谓词与其支配的名词短语为主谓关系、当核心动谓词的宾语缺失时以定中关系表达的修饰词统称为广义的动宾关系。之所以这样设计,因为这两种情况下,支配名词都可以被视作受事者角色,和动词性谓词可以联合完整的表示语义,从而获得尽可能一般性的语义表示模型。Therefore, when constructing the first-level semantics, the present application takes the direct object relationship, indirect object relationship, pre-object relationship, core verb predicate and its dominant noun phrase as adjective-in-the-middle relationship, and core verb predicate and its dominant noun phrase as main predicate. The relation and the modifiers expressed by the central relation when the object of the core verb predicate is missing are collectively referred to as the generalized verb-object relation. The reason for this design is that in both cases, the dominant noun can be regarded as the subject role, and the verb predicate can be combined to fully express the semantics, so as to obtain as general a semantic representation model as possible.

则基于上述预定义本发明提出构建多级语义模型的方法,如图8所示,包括:Then based on the above-mentioned predefined methods, the present invention proposes a method for constructing a multi-level semantic model, as shown in Figure 8, including:

1、遍历独立功能成分找到核心动谓词,以检测到的广义动宾关系得到一级语义。1. Traverse the independent functional components to find the core verb predicate, and obtain the first-level semantics with the detected generalized verb-object relationship.

这里的广义动宾关系包括直接宾语关系、间接宾语关系、前置宾语关系、核心动谓词与其支配的名词短语为状中关系、核心动谓词与其支配的名词短语为主谓关系、当核心动谓词的宾语缺失时以定中关系表达的修饰词。The generalized verb-object relationship here includes direct-object relationship, indirect-object relationship, pre-object relationship, central verb-predicate and the noun phrase governed by it as adjective, the core verb-predicate and its governed noun phrase as subject-predicate relationship, when the core verb predicate A modifier expressed in a definite relation when the object of is missing.

2、对一级语义中的谓词中心词和宾语中心词检测其修饰成分得到二级语义。2. Detect the modifier components of the predicate center word and the object center word in the first-level semantics to obtain the second-level semantics.

通过状中关系(ADV)、动补关系(CMP)、介宾关系(POB)对谓语中心词和宾语中心词进行修饰补充说明。Modification and supplementation of predicate head words and object head words are carried out through adjective-median relation (ADV), verb-complement relation (CMP) and prepositional-object relation (POB).

3、对二级语义的中心词检测其修饰成分得到三级语义。3. Detect the modified components of the central word of the second-level semantics to obtain the third-level semantics.

二级语义的中心词有别于一级语义中的谓语中心词和宾语中心词,为二者的修饰补充部分,若修饰补充部分再无其他修饰词则不构建三级语义。The center word of the second-level semantics is different from the predicate-centered word and the object-centered word in the first-level semantics, and is the modified supplementary part of the two. If there are no other modifiers in the modified supplementary part, the third-level semantics will not be constructed.

4、将高于三级的语义成分补全到其隶属的中心词上。4. Complete the semantic components higher than the third level to the head word to which it belongs.

整个抽取过程按照依存关系分析树关系弧的反方向自底向上遍历依存关系分析树进行。The entire extraction process is performed by traversing the dependency analysis tree from the bottom to the top in the opposite direction of the dependency analysis tree relationship arc.

以“自制手工护肤品顶级卸妆油”为例,其依存关系分析树见图9,核心动谓词“自制”及其宾语“卸妆油”构成一级语义,二级语义中谓语中心词“自制”再无修饰部分,宾语中心词“卸妆油”的修饰部分“顶级”和“护肤品”够成二级语义,二级语义中的中心词“护肤品”存在修饰部分,则以“护肤品”为中心词构建三级语义,按照前述步骤得到的多级语义模型如下:Taking "homemade handmade top-level cleansing oil for skin care products" as an example, the dependency analysis tree is shown in Figure 9. The core verb "self-made" and its object "cleansing oil" constitute the first-level semantics, and the predicate head word "self-made" in the second-level semantics There is no modified part, the modified parts of the object head word "cleansing oil" "top" and "skin care product" are sufficient to form secondary semantics, and the central word "skin care product" in the secondary semantics has a modified part, then "skin care product" Three-level semantics are constructed for the central word, and the multi-level semantic model obtained according to the preceding steps is as follows:

一级语义:{(自制,卸妆油,(VOB))}Level 1 semantics: {(homemade, cleansing oil, (VOB))}

二级语义:{(顶级,卸妆油,(ATT)),(护肤品,卸妆油,(ATT))}Secondary semantics: {(Top, Cleansing Oil, (ATT)), (Skincare, Cleansing Oil, (ATT))}

三级语义:{(手工,护肤品,(ATT))}Level 3 semantics: {(handmade, skincare, (att))}

可见,一级语义反映了该服务资源的主要功能,二、三级语义则是对核心词的修饰部分。It can be seen that the first-level semantics reflect the main function of the service resource, and the second- and third-level semantics are the modified parts of the core words.

在实际应用中,单个独立功能成分的资源描述虽然只有一个核心业务功能,但也分为以下两种情况:In practical applications, although the resource description of a single independent functional component has only one core business function, it is also divided into the following two situations:

(1)一级存在核心宾语(1) There is a core object at the first level

以“低价出租临时网络空间”为例,通过依存关系分析得到语法树如图10所示,给出其多级语义模型为:Taking "low-cost rental of temporary network space" as an example, the syntax tree obtained through dependency analysis is shown in Figure 10, and its multi-level semantic model is given as:

一级语义:{(出租,网络空间,(VOB))}First-level semantics: {(rent, cyberspace, (VOB))}

二级语义:{(低价,出租,(ADV)),(临时,网络空间,(ATT))}Secondary semantics: {(low, rent, (ADV)), (temporary, cyberspace, (ATT))}

(2)一级不存在核心宾语(2) There is no core object at the first level

上述提到过,核心动词谓语的宾语缺失,既可能是因为语言本身的表达,也可以是依存关系分析本身的误差导致。此种情况下,语义主要由核心谓语及其修饰部分表达,本发明中以定中(ATT)关系表达的修饰仍视作宾语,位于一级语义的位置。As mentioned above, the lack of the object of the core verb predicate may be caused by the expression of the language itself or the error of the dependency analysis itself. In this case, the semantics are mainly expressed by the core predicate and its modified parts, and the modification expressed in the attentive (ATT) relationship in the present invention is still regarded as an object, which is located at the position of the first-level semantics.

以“网站安全检测”为例,其依存关系分析树如图11所示,其多级语义模型为:Taking "website security detection" as an example, its dependency relationship analysis tree is shown in Figure 11, and its multi-level semantic model is:

一级语义:{(检测,安全,(ATT))}First-level semantics: {(detect, secure, (att))}

二级语义:{(网站,安全,(ATT))}Secondary semantics: {(site, security, (att))}

基于上述本发明提出的众包设计资源的语义分析方法,将众包设计资源短句分词、进行词性标注、划分独立短句、针对每个独立短句划分独立功能成分、针对每个独立功能成分构建多级语义模型,从而将非结构化的自然语言短句描述转换为结构化的关系集合,实现了众包设计资源的统一建模,对后续的检索、匹配都具有重要的意义。Based on the semantic analysis method for crowdsourced design resources proposed by the present invention, the crowdsourced design resources are segmented into short sentences, tagged with part of speech, divided into independent short sentences, divided into independent functional components for each independent short sentence, and for each independent functional component A multi-level semantic model is constructed to convert the unstructured natural language short sentence description into a structured relational set, which realizes the unified modeling of crowdsourcing design resources, and is of great significance for subsequent retrieval and matching.

最后,以“手机商城网站建设网上购物在线支付购物网站开发”为例对本发明提出的众包设计资源的语义分析方法进行应用得到结构化的关系集合,依存关系分析树见图12,结构化处理包括:Finally, taking "mobile phone mall website construction online shopping and online payment shopping website development" as an example, the semantic analysis method of crowdsourcing design resources proposed by the present invention is applied to obtain a structured relationship set, and the dependency relationship analysis tree is shown in Figure 12. Structured processing include:

1、对该短句进行分词、词性标注。1. Perform word segmentation and part-of-speech tagging on the short sentence.

2、对1处理后的短句通过空格、顿号、逗号和正反斜杆将短句切分为多个独立短句;2. Divide the short sentence after processing 1 into multiple independent short sentences through spaces, commas, commas and forward and backward slashes;

该短句没有空格、顿号、逗号和正反斜杠,故这一步跳过,可将其整体视为一个独立短句。The phrase has no spaces, commas, commas, and forward and backward slashes, so skip this step and treat it as a single phrase as a whole.

3、针对该独立短句进行依存关系分析。3. Carry out dependency analysis on the independent short sentence.

4、以并列关系和独立结构提取该独立短句中的独立功能成分。4. Extract the independent functional components in the independent short sentence with juxtaposition and independent structure.

遍历该独立短句,检索到其核心动谓词为“建设”和“开发”,按照并列关系(COO)分别以“建设”和“开发”作为核心动谓词将其划分为两个独立功能成分:独立功能成分1(核心动谓词为“建设”)和独立功能成分2(核心动谓词为“开发”)。Traversing the independent short sentence, it is retrieved that its core verb predicates are "construction" and "development", and it is divided into two independent functional components with "construction" and "development" as the core verb predicates according to the parallel relationship (COO): Independent functional component 1 (the core verb predicate is "construction") and independent functional component 2 (the core verb predicate is "development").

该独立短句中没有独立动词和独立名词等独立结构,故不需要按照独立结构划分独立功能成分。There are no independent structures such as independent verbs and independent nouns in the independent short sentence, so there is no need to divide independent functional components according to independent structures.

5、针对每个独立功能成分构建多级语义模型。5. Build a multi-level semantic model for each independent functional component.

针对独立功能成分1,根据广义动宾关系得到其一级语义{(建设 , 网站1,(ATT))},针对一级语义中的谓语中心词“建设”和宾语中心词“网站”搜索其修饰部分得到二级语义{(商城,网站1,(ATT))},针对二级语义中的中心词“商城”检索其修饰部分得到三级语义{(手机,商城,(ATT))},得到其多级语义模型为:For the independent functional component 1, the first-level semantics {(construction, website1, (ATT))} is obtained according to the generalized verb-object relationship, and the predicate-centered word "construction" and the object-centered word "website" in the first-level semantics are searched for The modified part gets the second-level semantics {(mall, website 1, (ATT))}, and the modified part is retrieved for the central word "mall" in the second-level semantics to obtain the third-level semantics {(mobile phone, mall, (ATT))}, The multi-level semantic model is obtained as:

独立功能成分1:Independent functional ingredient 1:

一级语义:{(建设 , 网站1,(ATT))}Level 1 semantics: {(build, site1, (att))}

二级语义:{(商城,网站1,(ATT))}Secondary semantics: {(Mall, Site1, (ATT))}

三级语义:{(手机,商城,(ATT))}Level 3 semantics: {(mobile, mall, (ATT))}

针对独立功能成分2,根据广义动宾关系得到其一级语义{{(开发 , 网站2,(ATT))},针对一级语义中的谓语中心词“开发”和宾语中心词“网站”搜索其修饰部分得到二级语义{(网上购物,网站2,(ATT)),(购物,网站2,(ATT))},针对二级语义中的中心词“购物”检索其修饰部分得到三级语义{{(在线支付,购物,(ADV))},得到其多级语义模型为:For independent functional component 2, obtain its first-level semantics {{(Development, Website2, (ATT))} according to the generalized verb-object relationship, and search for the predicate-centered word "Development" and the object-centered word "Website" in the first-level semantics Its modified part gets secondary semantics {(online shopping, website 2, (ATT)), (shopping, website 2, (ATT))}, and the modified part is retrieved for the central word "shopping" in the secondary semantics to get the third level Semantic {{(online payment, shopping, (ADV))}, its multi-level semantic model is obtained as:

独立功能成分2:Independent functional ingredient 2:

一级语义:{(开发 , 网站2,(ATT))}First-level semantics: {(dev, site2, (att))}

二级语义:{(网上购物,网站2,(ATT)),(购物,网站2,(ATT))}Secondary Semantics: {(Online Shopping, Site2, (ATT)), (Shopping, Site2, (ATT))}

三级语义:{(在线支付,购物,(ADV))}Three-level semantics: {(online payment, shopping, (ADV))}

上述的网站1、2为同名区分。The above websites 1 and 2 are distinguished by the same name.

需要说明的是,本发明上述内容中提到的依存关系分析树为语义分析另有的现有分析手段,非本发明限定的技术手段,采用现有方法实现即可支持本发明的语义分析方法。It should be noted that the dependency analysis tree mentioned in the above content of the present invention is another existing analysis means for semantic analysis, and is not a technical means limited by the present invention, and can support the semantic analysis method of the present invention by using existing methods to realize .

需要说明的是,在具体实现过程中,上述的控制部分可以通过硬件形式的处理器执行存储器中存储的软件形式的计算机执行指令实现,此处不予赘述,而上述控制电路所执行的动作所对应的程序均可以以软件形式存储于系统的计算机可读存储介质中,以便于处理器调用执行以上各个模块对应的操作。It should be noted that, in the specific implementation process, the above-mentioned control part may be implemented by a processor in the form of hardware executing computer-executed instructions in the form of software stored in the memory, which will not be repeated here, and the actions performed by the above-mentioned control circuit The corresponding programs can be stored in the computer-readable storage medium of the system in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

上文中的计算机可读存储介质可以包括易失性存储器,例如随机存取存储器;也可以包括非易失性存储器,例如只读存储器、快闪存储器、硬盘或固态硬盘;还可以包括上述种类的存储器的组合。The computer-readable storage medium above can include volatile memory, such as random access memory; can also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid-state hard disk; can also include the above-mentioned types combination of memory.

上文所提到的处理器,也可以是多个处理元件的统称。例如,处理器可以为中央处理器,也可以为其他通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者可以是任何常规的处理器等等,还可以为专用处理器。The processor mentioned above may also be a collective term for multiple processing elements. For example, the processor may be a central processing unit, or other general-purpose processors, digital signal processors, application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. . A general purpose processor may be a microprocessor or it may be any conventional processor, etc., but also a special purpose processor.

应该指出的是,上述说明并非是对本发明的限制,本发明也并不仅限于上述举例,本技术领域的普通技术人员在本发明的实质范围内所做出的变化、改型、添加或替换,也应属于本发明的保护范围。It should be pointed out that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples. Changes, modifications, additions or substitutions made by those of ordinary skill in the art within the essential scope of the present invention, It should also belong to the protection scope of the present invention.

Claims (4)

1. A semantic analysis method for crowdsourced design resources is characterized by comprising the following steps:
step 1, carrying out word separation and part-of-speech tagging on crowdsourced design resource short sentences;
step 2, segmenting the short sentence processed in the step 1 into a plurality of independent short sentences;
and 3, aiming at each independent short sentence, carrying out the following processing:
1) analyzing the dependence relationship;
2) extracting independent functional components in the independent short sentences in a parallel relation and an independent structure;
3) constructing a multilevel semantic model for each independent functional component:
(1) traversing the independent functional components to find out the core dynamic predicates, and obtaining primary semantics according to the detected generalized dynamic guest relationship; the generalized verb relationship comprises a direct object relationship, an indirect object relationship, a preposed object relationship, a core predicate and noun phrases dominated by the core predicate as a shape-middle relationship, a core predicate and noun phrases dominated by the core predicate as a main-predicate relationship, and modifiers expressed in a middle relationship when the object of the core predicate is absent;
(2) detecting the modification components of predicate central words and object central words in the primary semantics to obtain secondary semantics;
(3) detecting the modified components of the central words of the secondary semantics to obtain tertiary semantics;
(4) and completing the detected semantic components higher than three levels to the subordinate central words.
2. The method for semantic analysis of a crowdsourced design resource of claim 1, wherein in step 2, the phrases are segmented into a plurality of independent phrases by spaces, pause signs, commas and forward and backward slashes.
3. The method for semantic analysis of crowdsourced design resources according to claim 1, wherein extracting independent functional components in independent phrases in a parallel relationship and an independent structural relationship comprises:
(1) when the detected parallel relation is located in the generalized object modification area, equally dividing the detected parallel relation into two independent functional components on the basis of the core dynamic predicate;
(2) when the detected parallel relation is positioned between the core dynamic predicates, equally dividing the core dynamic predicates into two independent functional components;
(3) when the parallel relationship and the independent structure exist at the same time, if the independent structure is a verb predicate and the parallel relationship exists, the independent structure is equally divided into two independent functional components based on the verb predicate of the independent structure; if the independent structure is not a verb predicate, no independent functional component is created.
4. The method for semantic analysis of crowdsourced design resources of claim 1, wherein building a multilevel semantic model for each independent functional component comprises:
when the core object is detected not to exist in the primary semantic, the modified part of the core object is regarded as the object and is built into the primary semantic.
CN202210543747.6A 2022-05-19 2022-05-19 Semantic analysis method for crowdsourcing design resources Active CN114970543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210543747.6A CN114970543B (en) 2022-05-19 2022-05-19 Semantic analysis method for crowdsourcing design resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210543747.6A CN114970543B (en) 2022-05-19 2022-05-19 Semantic analysis method for crowdsourcing design resources

Publications (2)

Publication Number Publication Date
CN114970543A true CN114970543A (en) 2022-08-30
CN114970543B CN114970543B (en) 2024-11-01

Family

ID=82985215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210543747.6A Active CN114970543B (en) 2022-05-19 2022-05-19 Semantic analysis method for crowdsourcing design resources

Country Status (1)

Country Link
CN (1) CN114970543B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116484870A (en) * 2022-09-09 2023-07-25 北京百度网讯科技有限公司 Method, device, equipment, medium and computer product for extracting text information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287679A (en) * 2003-03-20 2004-10-14 Fuji Xerox Co Ltd Natural language processing system and natural language processing method and computer program
KR20100093736A (en) * 2009-02-17 2010-08-26 손길연 The way to study english by matching between a player and another with english puzzles combined various colors and diagrams on the base of cognitive linguistics
CN101937430A (en) * 2010-09-03 2011-01-05 清华大学 A Method for Extracting Event Sentence Patterns in Chinese Sentences
CN113128237A (en) * 2021-04-09 2021-07-16 青岛海大新星软件咨询有限公司 Semantic representation model construction method for service resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287679A (en) * 2003-03-20 2004-10-14 Fuji Xerox Co Ltd Natural language processing system and natural language processing method and computer program
KR20100093736A (en) * 2009-02-17 2010-08-26 손길연 The way to study english by matching between a player and another with english puzzles combined various colors and diagrams on the base of cognitive linguistics
CN101937430A (en) * 2010-09-03 2011-01-05 清华大学 A Method for Extracting Event Sentence Patterns in Chinese Sentences
CN113128237A (en) * 2021-04-09 2021-07-16 青岛海大新星软件咨询有限公司 Semantic representation model construction method for service resources

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116484870A (en) * 2022-09-09 2023-07-25 北京百度网讯科技有限公司 Method, device, equipment, medium and computer product for extracting text information
CN116484870B (en) * 2022-09-09 2024-01-05 北京百度网讯科技有限公司 Method, device, equipment and medium for extracting text information

Also Published As

Publication number Publication date
CN114970543B (en) 2024-11-01

Similar Documents

Publication Publication Date Title
CN109241538B (en) Chinese entity relation extraction method based on dependency of keywords and verbs
CN111460787B (en) Topic extraction method, topic extraction device, terminal equipment and storage medium
CN109726274B (en) Question generation method, device and storage medium
TWI512507B (en) A method and apparatus for providing multi-granularity word segmentation results
US6782384B2 (en) Method of and system for splitting and/or merging content to facilitate content processing
US10496756B2 (en) Sentence creation system
CN104182535B (en) Method and device for extracting character relation
US9053090B2 (en) Translating texts between languages
Fernandes et al. Latent trees for coreference resolution
US20180081861A1 (en) Smart document building using natural language processing
US8639496B2 (en) System and method for identifying phrases in text
KR102083017B1 (en) Method and system for analyzing social review of place
CN114997288B (en) A design resource association method
JP3372532B2 (en) Computer-readable recording medium for emotion information extraction method and emotion information extraction program
US10810368B2 (en) Method for parsing natural language text with constituent construction links
Parameswarappa et al. Kannada word sense disambiguation using decision list
Wong et al. iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
CN114970543B (en) Semantic analysis method for crowdsourcing design resources
CN110888940A (en) Text information extraction method and device, computer equipment and storage medium
KR20130099327A (en) Apparatus for extracting information from open domains and method for the same
Žitnik et al. SkipCor: Skip-mention coreference resolution using linear-chain conditional random fields
JP2005202924A (en) Translation determination system, method, and program
JP6114090B2 (en) Machine translation apparatus, machine translation method and program
Ogrodniczuk et al. Rule-based coreference resolution module for Polish
Plu et al. Adel: Adaptable entity linking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant