WO2022037103A1 - Time-space boundary-oriented multi-party service value-quality-capability index alignment method - Google Patents

Time-space boundary-oriented multi-party service value-quality-capability index alignment method Download PDF

Info

Publication number
WO2022037103A1
WO2022037103A1 PCT/CN2021/089373 CN2021089373W WO2022037103A1 WO 2022037103 A1 WO2022037103 A1 WO 2022037103A1 CN 2021089373 W CN2021089373 W CN 2021089373W WO 2022037103 A1 WO2022037103 A1 WO 2022037103A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
relationship
service
semantic
space
Prior art date
Application number
PCT/CN2021/089373
Other languages
French (fr)
Chinese (zh)
Inventor
涂志莹
李敏
王忠杰
徐晓飞
徐汉川
Original Assignee
哈尔滨工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 哈尔滨工业大学 filed Critical 哈尔滨工业大学
Publication of WO2022037103A1 publication Critical patent/WO2022037103A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching

Definitions

  • the invention belongs to the technical field of enterprise interoperability in software engineering, in particular to the field of multi-participant service non-functional attribute alignment, and relates to a time-space-oriented multi-party service value-quality-capability index alignment method.
  • Enterprise interoperability is a prerequisite for the exchange of data and information among service participants, to reach consensus on service requirements and service goals, and to establish a stable cooperative relationship and a reliable cooperative model.
  • the "European Interoperability Framework for Pan-European E-Government Services" (ElF) identifies three types of organizational interoperability, technical interoperability and semantic interoperability.
  • organizational interoperability is related to enterprise organizational structure and business implementation process, which can be solved with the help of modeling specifications and model transformation methods; technical interoperability includes interactive interfaces, data integration, representation and exchange, usually with the help of standardized metadata formats and meanings As a reference to achieve data consistency; semantic interoperability is to eliminate inconsistencies in the exchange of information between different enterprises.
  • Service evaluation index is a statistical index to measure and evaluate service value-quality-capability. It is an effective reference information for service decision-making and optimization, and it is also an important negotiation content for various service providers to establish cooperative relations.
  • the evaluation indicators contain not only rich semantic information, but also detailed qualitative and quantitative description information. Different participants have their own norms and habits in the definition, interpretation, quantification, and empowerment of indicators.
  • the premise of the cooperation between the two parties is to realize the alignment of the semantics and quantitative methods of the multi-party service evaluation indicators, so as to ensure that the content expressed by each other's indicators and the meaning of the values can be accurately understood in the process of multi-party cooperation and cooperation.
  • the traditional research on semantic interoperability of heterogeneous enterprise models mainly focuses on using ontology as the semantic model, establishing domain ontology through ontology construction or reconstruction techniques (ontology hybridization, synthesis, mutation, etc.), and providing semantic reference for model interoperability.
  • Ontology-based model semantic mapping rules and strategies realize semantic alignment between heterogeneous enterprise models, including term alignment, conceptual granularity alignment, angle alignment, coverage alignment, etc., but these alignment schemes cannot solve the alignment of indicator measurement methods; Semantic conflicts between various heterogeneous models, including the same name but different names, different names for the same name, inconsistent scope of the concept, etc.; finally realize the information sharing and business cooperation between the alliances.
  • the present invention provides a method for aligning multi-party service evaluation indicators oriented to the space-time boundary.
  • a multi-party service value-quality-capability index alignment method oriented to the space-time boundary comprising the following steps:
  • Step 1 Extract keyword groups including service content, business activities, index evaluation aspects and index evaluation rules from the index definition, wherein:
  • the indicator definition includes indicator name, abbreviation/idiom, English abbreviation, indicator explanation, superior direction, dimension (unit + order of magnitude), value range, and calculation formula;
  • Service content including service providers (personnel roles, system tools, software applications, etc.), service carriers (commodities, orders, knowledge, data, etc.) and service execution environment and context, generally Noun phrases;
  • 2Business activities including specific implementation behaviors of service providers and detailed disposal methods of service carriers, generally verb phrases;
  • 3Evaluation aspects including service content and business activities modifiers, generally represented by XX rate
  • Step 2 According to the public dictionary, the domain dictionary and the self-built dictionary, calculate the morpheme relationship between the four key groups of the two indicators, and obtain the semantic similarity matrix between the indicators, where:
  • Described public dictionary includes synonym word forest (extended version), HowNet dictionary, Baidu Chinese dictionary;
  • the domain dictionary includes Sogou industry thesaurus and Baidu industry thesaurus, including six entries: concept identifier, concept name, synonym, English name, semantic description, and application field. It is established by field experts based on their understanding and experience of the field. A list of domain-specific concepts;
  • the definition content of the phrase in the self-built dictionary includes ID, phrase, part of speech, the category (one of service content, business activity, index evaluation side, index evaluation rule), synonyms, antonyms, similar words, hypernyms, Hyponymy, causal-related phrases, belonging/source-related phrases, usage/tool-related phrases, composition/total score-related phrases, and execution-dependent-related phrases;
  • the morpheme relationship includes four types: similar (highly similar), similar (weaker than similar), related, and similar;
  • the semantic similarity matrix is a two-dimensional matrix, which are four types of keyword group sets of two indicators;
  • Step 3 Determine the semantic relationship between the indicators with the help of the semantic similarity matrix, and calculate the relationship confidence, where:
  • the semantic relationship includes similarity relationship (1same index; 2 conjugate index; 3 subordinate index; ), related relationship (4 service content related; 8 Similar business; 9 Similar service content);
  • Step 4 Determine the semantic relationship of all indicators according to Step 3 to obtain a semantic relationship network, delete redundant edges according to the direction and quantity of the semantic relationship between the indicators, and simplify the semantic network, wherein:
  • the semantic relationship network refers to a network with indicators as nodes and semantic relationships between indicators as edges.
  • the edge attributes are the semantic relationship type and confidence, and the direction of the edge includes two kinds of directed and undirected. 5 Business-related is directional;
  • Step 5 Fit the distribution characteristics of the indicator in the single domain and the rich domain according to the sample data of the indicator in different space-time boundaries, where:
  • Time refers to different time domains
  • space refers to different geographic domains
  • boundary refers to different service implementation environments (online or offline), different service implementation platforms or different service participants;
  • the single domain distribution feature refers to the probability distribution feature of the indicator in one service domain
  • the rich domain distribution feature refers to the probability distribution feature of the indicator in two or more service domains
  • Step 6 Establish an alignment relationship in the way of index quantification with the probability quantile as a reference, in which:
  • the alignment relationship in the index quantification method refers to finding the corresponding index value range of a certain type of service level under different space-time boundary characteristics, or determining the corresponding service level of the index value under a specific space-time boundary.
  • the present invention has the following advantages:
  • the present invention does not depend on the construction of ontology, but uses common methods of natural language processing to extract key words contained in the sentences defined and explained by indicators, and uses public dictionaries and domain
  • the lexical information and morpheme relationships contained in the dictionary are used to mine the correlation between different indicators.
  • the present invention summarizes the factors that lead to inconsistent quantification methods in the process of collaboration among multiple participants, and considers the relationship between the specific value of the indicator in the multi-dimensional service implementation environment and the actual service level to be expressed from the perspective of space and time. Mapping relationship, to achieve the alignment of index quantification.
  • Fig. 1 is the multi-party service value-quality-capability index alignment method framework oriented to the space-time boundary of the present invention
  • Fig. 2 is the method framework of the multi-participant service value-quality-capability index semantic alignment oriented to domain features of the present invention
  • Fig. 3 is the method framework of the multi-participant service value-quality-capability index quantification method alignment oriented to spatiotemporal features of the present invention
  • Fig. 4 is the principle of index relation judgment in the semantic alignment stage of the present invention.
  • FIG. 5 is an example diagram of the keyword analysis of the domain feature-oriented service evaluation index of the present invention.
  • FIG. 6 is a schematic diagram of semantic alignment of domain feature-oriented multi-participant service evaluation indicators of the present invention.
  • FIG. 7 is an example diagram of a single-domain distribution feature of indicators oriented to spatiotemporal features of the present invention.
  • FIG. 8 is an exemplary diagram of a spatiotemporal feature-oriented index rich domain distribution feature of the present invention.
  • FIG. 9 is a theoretical diagram of alignment of the spatiotemporal feature-oriented multi-participant service evaluation index quantification method according to the present invention.
  • the invention provides a multi-party service value-quality-capability index alignment method oriented to the space-time boundary.
  • the method is divided into two parts: the semantic alignment of the multi-participant service evaluation index oriented to the domain characteristics and the multi-participant service oriented to the characteristics of the time-space boundary.
  • the quantification methods of evaluation indicators are aligned, and the framework is shown in Figure 1-3.
  • the purpose of semantic alignment of the present invention is to extract key elements of indicators through natural language processing related technologies on the premise of knowing the multi-domain and multi-participant service value-quality-capability evaluation index system, and then calculate with the help of public dictionaries, domain dictionaries and self-built dictionaries.
  • the semantic relationship between the four types of phrases is finally determined on the basis of the lexical relationship matrix and the relationship confidence is calculated.
  • the multi-domain and multi-participant index semantic relationship network is obtained.
  • Each participant can learn the relationship between its own service index and other party's index from the semantic relationship network. This relationship is not limited to the situation of the same name but different names or different names, and can also mine richer semantic relationships.
  • the original index definition includes index name, abbreviation/idiom, English abbreviation, index explanation, superior direction, dimension (unit + order of magnitude), value range, calculation formula, etc.
  • the abbreviation/idiom and English abbreviation include: Strong domain expertise, it is necessary to use the relevant explanations contained in the domain dictionary to assist understanding; the index names and explanations lack normative, and the naming methods and explanation details of different participants are inconsistent; the calculation content also implies index related relation.
  • the present invention completes the index preprocessing in the first step, extracts the key elements of the index through natural language processing technologies such as word segmentation, part-of-speech tagging, dependency syntax analysis, word frequency statistics, etc., and eliminates those that are difficult to understand or irrelevant to service evaluation.
  • Words get [service content, business activities, index evaluation side, index evaluation rules] four types of phrases.
  • Service content It includes the roles of personnel involved in service implementation, the resources that service execution depends on, tangible products or valuable knowledge information accompanying the service delivery process, etc., generally represented by proper nouns.
  • Indicator evaluation side describe the nouns that modify service content or business activities, generally with specific suffixes, such as XX rate, XX degree, XX effect, XX nature.
  • the evaluation indicators have specific evaluation frequency and objects, such as daily average, monthly average, annual average; or per person, per order, per case.
  • the main judging basis of the index relationship of the present invention is three types of dictionaries: public dictionaries, domain dictionaries and self-built dictionaries.
  • the lexical richness, lexical relationship detail, lexical explanation detail, and lexical organization structure in the dictionaries will affect the calculation result. reliability.
  • the present invention selects the synonym Cilin (extended version), HowNet dictionary, and Baidu Chinese dictionary as public dictionaries that can be referred to; Sogou industry thesaurus and Baidu industry thesaurus are domain dictionaries that can be referred to; the self-built dictionary contains ID, phrase , part of speech, described category (one of the four of service content, business activity, evaluation side, evaluation rules), synonyms, antonyms, similar words, hypernyms, hyponyms, causally related phrases, belonging/source related phrases, usage/tools Related phrases, composition/total score related phrases, execution-dependent related phrases, etc. Then comprehensively use the above dictionary information to calculate the relationship between the four types of phrases.
  • the present invention defines three major categories and nine sub-categories for the correlation between indexes on the semantic level, wherein: the nine categories of relations are explained as follows:
  • the same indicator It means that the service content, business activities, indicator evaluation aspects and modifiers can all correspond, and all have highly similar semantics. eg. Food packaging rate, food packaging efficiency.
  • Conjugate index It means that the service content and business activities are highly similar, but the evaluation aspects of the index are antonyms to each other. eg. The cleanliness of the restaurant and the degree of clutter in the dining environment.
  • word A is a component of word B, or word A is a subcategory of word B).
  • word A is a subcategory of word B.
  • Commodity defective rate fresh defective rate.
  • Relevance of service content refers to similar business activities (if both exist), similar aspects of index evaluation (weaker than similar approximation), and there is a certain correlation between service content, such as the health status of the chef and the hygiene of the dishes, and the dishes are made by the chef. , health and hygiene are similar.
  • Business-related Refers to similar service content, similar indicators and evaluation aspects, and there is a certain correlation between business activities, such as the firmness of food packaging and the degree of non-destructiveness of food transportation, because packaging is a pre-order activity of transportation, and the degree of firmness and non-destructiveness are similar .
  • Indicator correlation It means that there is no obvious correlation between service content and business activities, but when the indicator description contains accompanying words such as "with XXX” and "more XX, more XX", it indicates that there is a correlation between the two indicators. If the change trend is consistent, it is positive correlation; otherwise, it is negative correlation. For example, the delivery time of dishes is negatively correlated with the degree of quality assurance of the dishes. Obviously, the longer the delivery time, the worse the quality assurance of the dishes.
  • Similar indicators/service evaluation aspects Refers to the similar service evaluation aspects, but the service content and business activities are neither similar nor related, or the service content and business activities are not extracted. In this case, a similar relationship can be roughly defined. eg. Dishes packaging accuracy, order accounting accuracy.
  • Similar business refers to similar business activities, but the service content and evaluation aspects are neither similar nor related. eg. The accuracy of food packaging and the firmness of food packaging.
  • Similar service content Refers to the similar service content, but the business activities and evaluation aspects are neither similar nor related. eg. Commodity storage time, the proportion of finishing commodities.
  • the main work of the preprocessing stage is to extract indicators
  • the keyword groups of these four types of information contained in The reason why it is not four words but phrases is that some indicators may contain words such as "such as XX", “including XX", "XX, etc.” in the content of the indicator explanation.
  • the input in the preprocessing stage is a sentence S i defined and explained by an indicator.
  • the purpose of word segmentation is to extract all the words belonging to the above four categories of keywords from the sentence and remove unnecessary stop words to obtain WG (WG represents the number of key words).
  • important words containing actual semantics such as nouns, verbs, quantifiers, adverbs, adjectives, conjunctions, etc., can be identified from WG, and corresponding to the service content phrase WG services , business activity phrase WG business , and indicator evaluation side phrases WG indicators , modifier phrase WG adjunctword .
  • the dependency/modification relationship between words of different parts of speech can be obtained at the stage of dependency syntax analysis.
  • association relationships By synthesizing the analysis results of all evaluation indicators, the following four types of association relationships can be summarized: 1What are the related business actions of a certain service content; 2 Who are the implementers of a business activity and who are the recipients; 3 What are the specific evaluation aspects of a service content or business activity; 4 Which evaluation aspects are public (most service content or business activities will be considered) .
  • dependency syntactic analysis can also clarify the co-ordinated words related to conjunctions, and can further delete unimportant words.
  • the above preprocessing work can be completed by relying on natural language processing toolkits such as StanfordNLPCore and language models trained on public large corpora.
  • table turnover rate the original definition of the indicator is as follows: [Table turnover rate; the average number of times each table is used in a hotel in a day, the table turnover rate is an important indicator to measure the profitability of a restaurant and is closely related to the average daily passenger flow of the restaurant; (table turnover rate) The number of times of use - the total number of units) ⁇ the total number of units].
  • the four types of phrases obtained after preprocessing are as follows:
  • WG services ⁇ restaurant, table, dining room, table ⁇ ;
  • WG indicators ⁇ number of times, total number of stations, passenger flow ⁇ ;
  • WG adjunctword ⁇ one day, per sheet, daily average ⁇ .
  • the present invention uses the ID-IDF method to quantify The importance of each word is analyzed, and the unimportant words are deleted. At the same time, this importance will also be involved in the subsequent indicator relationship determination.
  • the calculation formula is as follows:
  • n i,j is the total number of occurrences of a specific word i in an indicator j
  • n k,j is the total number of occurrences of other words k in the indicator j
  • represents the number of all indicators
  • idf i denotes the degree of exclusiveness of the word in the explanation of the index.
  • index correlation is directly affected by lexical semantic association.
  • the existing open dictionaries partially meet the needs in this regard, but most of them only include hyponymous relations, synonymous relations, antonymous relations, homogeneous relations, etc.
  • the related relationship has not been included.
  • the present invention summarizes the common lexical semantic relationships of service evaluation indicators, but there is no excellent method to accurately extract these semantic relationships from the public domain, so it is temporarily replaced by a rough lexical semantic relationship dictionary and a user-built dictionary.
  • A is a kind of B
  • A is a hyponym of B
  • B is a hypernym of A.
  • a and B have a common abstract parent class in the tree-like upper-lower relationship. Such as “dishes” and “meat products”.
  • A is the raw material of B, and B is processed by A. Such as “dishes” and “ingredients”.
  • A is a tool of B related business, such as "dish” and "refrigerator”.
  • a activity is the pre-order activity of B activity, and B activity is the successor activity of A activity, such as "packaging” and "delivery”;
  • a and B belong to the same class of quantifiers, so they can be converted with the help of conversion formulas, such as "daily average” and “monthly average”.
  • the index system builder can configure the "similar judgment threshold TH hs ", "similar judgment threshold TH s ", “similar judgment threshold TH ls ", and "related judgment threshold TH r " (thresholds take The value range is between 0 and 1. There is no value limit for the relevant judgment threshold. The other three thresholds need to satisfy TH hs > TH s > TH ls ). On the other hand, you can configure the "lower limit of relationship number” and “upper limit of relationship number”, and automatically adjust the size of the above four thresholds on the premise of ensuring the number of relationships as much as possible.
  • the present invention expresses it as the following six categories:
  • High similarity the calculated value of similarity between words is greater than the similarity judgment threshold TH hs ;
  • Antonym of each other refers to the words of the adjective part of speech that are antonyms to each other in the dictionary, or the sum of the sentiment values expressed is approximately 1;
  • LS Hyponymy relationship
  • NULL means that there is neither a highly similar relationship nor a related relationship; or the category of words does not exist in the definition of one indicator.
  • the determination of the above semantic relationship can be obtained by calculating the position, number, identifier and dictionary structure of the word in the dictionary.
  • Step 3 Determining the relationship between indicators
  • the relationship between the four types of words is determined with the help of an open public dictionary.
  • the synonym forest, HowNet and Baidu Chinese dictionary are adopted in the experiment of the present invention, which contains information such as word frequency, part of speech, synonyms, hypernyms, word codes, and related words.
  • users can also build their own dictionaries to supplement.
  • To determine whether there is a certain semantic relationship between the two indicators In, Im first calculate the same type of phrases The semantic association that exists between k ⁇ ⁇ services,bu sin ess,indicators,adjunctword ⁇ .
  • the relationship between homogeneous phrases can be calculated using a matrix Express:
  • the index In contains p words
  • the index Im contains q words
  • each word has a corresponding IF-IDF value
  • the matrix size is p ⁇ q.
  • Each element a i,j in the matrix is a two-tuple ⁇ RelarionType, Confidence> including the relation type and confidence between words, where RelationType ⁇ HS,AN,SY,LS,RE,NULL ⁇ and Confidence ⁇ [0 ,1].
  • the r Max corresponding to the maximum value of SD r is the semantic association type of this type of phrase, and the confidence level of this semantic association is the mean of the confidences of all elements of the same type in the matrix (other statistics can also be adopted).
  • n and m represent the index I n and the index I m respectively
  • k refers to the four types of keyword groups
  • num refers to the number of words.
  • Step 4 Optimize the relationship between indicators
  • the present invention defines the following evaluation indicators:
  • the in-degree of a node indicates the degree of dependence of the node in the comprehensive index evaluation system, which means that many related variables or indicators will determine or affect the value of the index. If the maximum in-degree of the node is larger, it means that the index system The structure level is shallower, the fault tolerance rate is lower, and the error propagation probability is also lower.
  • the out-degree of a node indicates the importance of the node in the comprehensive index evaluation index system, which means that the index can determine or affect the value of multiple indicators. If the maximum node out-degree is larger, it means that the index system structure The more complex and unstable it is, the more likely it is to cause problems that affect the whole body.
  • the hit rate means that the index semantic relationship mined by the above method includes the deterministic centralized index Among them, e j represents the jth edge in the indicator semantic relation network, and ⁇ e ("Condition") represents the number of indicators that an element meets a certain condition.
  • This step is only to analyze the alignment effect of the above methods in detail. If the relationship between similar indicators is high, it means that the index evaluation system has high redundancy; The high proportion of index relationship means that the index system is more detailed.
  • This method is highly dependent on the lexicon and word semantic association judgment threshold, so the result of the index semantic alignment obtained by the artificial initial input may have insufficient relationship mining or relationship mining error.
  • the hit rate, error rate, and innovation degree mentioned in the above alignment result evaluation are all proportional to coverage.
  • the richness of the index content will also affect the determination of the index relationship. If the index content is too concise (the description of service content, business activities, and evaluation aspects is incomplete), it is often easy to be classified into the same index relationship. Therefore, if the relationship between similar indicators is high and the error rate is high, the content optimization can be explained by supplementary indicators.
  • the purpose of the quantitative alignment method of the present invention is to define the space-time boundary and divide the service domain based on the sample data of the known index under different space-time boundary conditions, and then use the kernel density estimation to fit the spatio-temporal boundary characteristic distribution of the index on the single domain and the rich domain. , solve the probability distribution function according to the fitted probability density function, and then use the quantile as the benchmark to solve the corresponding value of the index under different space-time boundary characteristics.
  • the mapping relationship between the specific value of the index and the actual service level is not unique and constant.
  • the same index value may also correspond to different service levels under different space-time boundary conditions, and different service levels are under different space-time boundary conditions. It is possible for the indicator to take the same value.
  • the price level and average price of commodities vary significantly in different regions.
  • the same commodity average price is high in Harbin but low in Shanghai; or distribution efficiency and delivery time also exist in time, space and field.
  • the efficient delivery time during the off-peak dining period only takes 20 minutes
  • the high-efficiency delivery time during the dining peak period is generally about 30-40 minutes
  • the efficient delivery time at midnight is 50-60 minutes.
  • the difference in characteristic distribution of indicators in different time and space boundaries is not considered, it will lead to the failure or imbalance of service decision-making and optimization.
  • an enterprise formulates a unified commodity price adjustment strategy across the country it will be obvious to low-income areas. Rising and high-income regions did not feel a significant difference.
  • the decision maker can perceive the distribution difference of the index value in different time and space boundaries, and formulate a reasonable enterprise decision plan according to the alignment mapping function.
  • the time domain has natural continuity and can be described by interval numbers.
  • the specific definition is as follows:
  • T start ,T end take a certain moment in the past or the current moment as T start , and define a specific deadline as T end ;
  • [T start ,T end ] period define fixed T start and T end , define a clock period period;
  • [N i , N j ] slice defines a fixed time slice slice, starting with the N i th slice and ending with the N j th slice.
  • T E-start , T E-end Event , taking the event occurrence as T E-start , taking the event’s influence end as T E-end , and Event being the trigger event in the time domain.
  • the spatial domain is the geographic domain, which can be described in the form of set algebra.
  • the specific definition is as follows:
  • Location 1 a geographic location with latitude and longitude attributes; 2 streets, business districts, communities, etc. with proper names; 3 names of provinces and municipalities determined according to the division of national administrative regions.
  • Regional attributes can be ranked by regional advantages (such as regional economic development, population density, education level, consumption index, etc.), and each region will correspond to a Rank value, thereby determining the partial order relationship.
  • the generalized domain is to divide the service domain into several sub-domains according to a certain boundary rule, highlighting the characteristics of different sub-domains and the fusion and transition between sub-domains with business optimization and service collaboration.
  • Boundary rules can be formulated according to the industry field, service content and nature, and the technology platform on which service execution depends.
  • the traditional definition of service boundaries is limited to the existence of management boundaries between autonomous organizations, and other boundaries are equivalent to the separation of technology platforms and service content caused by organizational boundaries.
  • organizational boundaries It is not enough to fully describe the existence of service boundaries. It is necessary to define richer service boundaries to provide a basis for judgment in service collaboration and integration.
  • Step 3 Calculate the alignment relationship of the indicators in terms of quantitative methods
  • step 2 we obtained the characteristic distribution of the indicators in different time-space boundary service domains.
  • the quantile ⁇ is used as the alignment reference, and it is assumed that the indicator I presents two distributions cdf(I a ) and cdf(I b ) on the two service domains a and b.
  • ⁇ [0,1] is the function of the independent variable, each quantile ⁇ ' corresponds to two index values i' a , i' b , so that the correspondence between the index values on the two service domains can be established relationship, as shown in Figure 9.
  • the alignment of multiple space-time boundary indicators is also established on the basis of quantiles.
  • the service level can be converted into a number between [0, 1], and it can be known that a certain service level is under different space-time boundary conditions. The corresponding specific index value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A time-space boundary-oriented multi-party service value-quality-capability index alignment method. The method is divided into two parts: domain feature-oriented multi-participant service value-quality-capability evaluation index semantic alignment and time space boundary feature-oriented multi-participant service value-quality-capability evaluation index quantification method alignment. The method does not rely on the construction of an ontology, but uses common means of natural language processing to extract key vocabulary contained in a sentence defined and explained by an indicator, and, by virtue of vocabulary information contained in public dictionaries and domain dictionaries and a morpheme relationship, mines the correlation between different indicators. In terms of quantification method alignment, the method summarizes the factors that lead to quantification methods being inconsistent in the collaborative process of multiple participants, and considers, from the perspective of time and space, the mapping between a specific value of an index in a multi-dimensional service implementation environment and the actual service level requiring expression, thus achieving index quantification method alignment.

Description

面向时空界的多方服务价值-质量-能力指标对齐方法The Alignment Method of Multi-Party Service Value-Quality-Capability Index Oriented to the Space-Time Boundary 技术领域technical field
本发明属于软件工程中企业互操作技术领域,尤其是针对多参与者服务非功能属性对齐领域,涉及一种面向时空界的多方服务价值-质量-能力指标对齐方法。The invention belongs to the technical field of enterprise interoperability in software engineering, in particular to the field of multi-participant service non-functional attribute alignment, and relates to a time-space-oriented multi-party service value-quality-capability index alignment method.
背景技术Background technique
企业互操作是各服务参与者间交互数据共享信息、达成服务需求与服务目标共识、建立稳定的协作关系和可靠的协作模式的前提条件。“泛欧电子政务服务的欧洲互操作性框架”(ElF)确定了组织互操作、技术互操作和语义互操作三种类型。其中组织互操作与企业组织结构和业务实施流程相关,借助建模规范和模型转换方法可以解决;技术互操作包括交互接口、数据集成、表示和交换有关,通常借助于规范的元数据格式和涵义作为参考实现数据的一致化;语义互操作性则是消除不同企业之间交换信息的不一致性。服务评价指标是度量和评价服务价值-质量-能力的统计指标,是服务决策和优化的有效参考信息,也是各类服务提供商确立合作关系的重要协商内容。评价指标既包含丰富的语义信息,又包含详细的定性和定量描述信息,不同参与者对指标的定义、解释、量化、赋权等方面均有其特定领域的规范和习惯,因此多领域多参与者协作的前提条件是实现多方服务评价指标的语义和量化方式对齐,以确保多方协作与合作过程中可以准 确理解彼此指标所表达的内容和取值的含义。Enterprise interoperability is a prerequisite for the exchange of data and information among service participants, to reach consensus on service requirements and service goals, and to establish a stable cooperative relationship and a reliable cooperative model. The "European Interoperability Framework for Pan-European E-Government Services" (ElF) identifies three types of organizational interoperability, technical interoperability and semantic interoperability. Among them, organizational interoperability is related to enterprise organizational structure and business implementation process, which can be solved with the help of modeling specifications and model transformation methods; technical interoperability includes interactive interfaces, data integration, representation and exchange, usually with the help of standardized metadata formats and meanings As a reference to achieve data consistency; semantic interoperability is to eliminate inconsistencies in the exchange of information between different enterprises. Service evaluation index is a statistical index to measure and evaluate service value-quality-capability. It is an effective reference information for service decision-making and optimization, and it is also an important negotiation content for various service providers to establish cooperative relations. The evaluation indicators contain not only rich semantic information, but also detailed qualitative and quantitative description information. Different participants have their own norms and habits in the definition, interpretation, quantification, and empowerment of indicators. The premise of the cooperation between the two parties is to realize the alignment of the semantics and quantitative methods of the multi-party service evaluation indicators, so as to ensure that the content expressed by each other's indicators and the meaning of the values can be accurately understood in the process of multi-party cooperation and cooperation.
传统的异构企业模型语义互操作研究主要关注以本体为语义模型,通过本体构建或重构技术(本体杂交、合成、变异等)建立领域本体,为模型互操作提供语义参考,在此基础上基于本体的模型语义映射规则和策略实现异构企业模型之间的语义对齐,包括术语对齐、概念粒度对齐、角度对齐、覆盖范围对齐等,但这些对齐方案并不能解决指标计量方式的对齐;解决各个异构模型之间的语义冲突,包括同名不同义、同义不同名、概念所指范围不一致等冲突;最终实现联盟间信息共享和业务协作。这一方案存在三个重要不足:(1)模型语义互操作的基础是领域本体的构建,本体的层次结构、关联度、权威性、完整性和一致性将直接影响到语义对齐的效果,现有的本体构建方案和工具为本体建设带来很大挑战,尤其是垂直领域本体的构建,本体的准确性与完整性很难保证;(2)现有的开放本体资源对概念和实例的定义一般局限于名词,但服务的评价离不开业务活动和评价侧面,这些在本体中并不会以概念的方式存在。而且现有的概念属性及概念关系挖掘不够充分,虽然整体来看信息量很大,但是聚焦到某一小的概念,其相关概念和实例缺存在较大缺失。(3)除此之外,仅仅实现语义层面的对齐并不能确保共享信息的一致性,现有工作对指标的量化方式的对齐关注甚少。The traditional research on semantic interoperability of heterogeneous enterprise models mainly focuses on using ontology as the semantic model, establishing domain ontology through ontology construction or reconstruction techniques (ontology hybridization, synthesis, mutation, etc.), and providing semantic reference for model interoperability. Ontology-based model semantic mapping rules and strategies realize semantic alignment between heterogeneous enterprise models, including term alignment, conceptual granularity alignment, angle alignment, coverage alignment, etc., but these alignment schemes cannot solve the alignment of indicator measurement methods; Semantic conflicts between various heterogeneous models, including the same name but different names, different names for the same name, inconsistent scope of the concept, etc.; finally realize the information sharing and business cooperation between the alliances. There are three important deficiencies in this scheme: (1) The basis of model semantic interoperability is the construction of domain ontology. The hierarchy, relevance, authority, integrity and consistency of ontology will directly affect the effect of semantic alignment. Some ontology construction schemes and tools bring great challenges to ontology construction, especially the construction of vertical domain ontology, the accuracy and integrity of the ontology are difficult to guarantee; (2) the definition of concepts and instances in the existing open ontology resources Generally limited to nouns, but the evaluation of services is inseparable from business activities and evaluation aspects, which do not exist in the form of concepts in the ontology. Moreover, the existing concept attributes and concept relationship mining are not sufficient. Although the overall amount of information is large, it focuses on a small concept, and its related concepts and examples are lacking. (3) Besides, just realizing the alignment at the semantic level cannot ensure the consistency of shared information, and the existing work pays little attention to the alignment of the quantification method of indicators.
发明内容SUMMARY OF THE INVENTION
本发明针对现有技术存在的上述不足,提供了一种面向时空界的多方服务评价指标对齐方法。Aiming at the above-mentioned shortcomings of the prior art, the present invention provides a method for aligning multi-party service evaluation indicators oriented to the space-time boundary.
本发明的目的是通过以下技术方案实现的:The purpose of this invention is to realize through the following technical solutions:
一种面向时空界的多方服务价值-质量-能力指标对齐方法,包括如下步骤:A multi-party service value-quality-capability index alignment method oriented to the space-time boundary, comprising the following steps:
步骤一、从指标定义中提取包含服务内容、业务活动、指标评价侧面和指标评价规则的关键词组,其中:Step 1: Extract keyword groups including service content, business activities, index evaluation aspects and index evaluation rules from the index definition, wherein:
所述指标定义包括指标名称、缩略语/习语、英文简写、指标解释、优越方向、量纲(单位+数量级)、取值范围、计算公式;The indicator definition includes indicator name, abbreviation/idiom, English abbreviation, indicator explanation, superior direction, dimension (unit + order of magnitude), value range, and calculation formula;
所述四类关键词组具体指:①服务内容,包括服务提供者(人员角色、系统工具、软件应用等)、服务载体(商品、订单、知识、数据等)以及服务执行环境和上下文,一般为名词词组;②业务活动,包括服务提供者的具体实施行为和服务载体的详细处置方式,一般为动词词组;③评价侧面,包括服务内容和业务活动的修饰词,一般以XX率|比例|占比、XX效果|程度、XX大小|速度|载重等;④评价规则,包括指标评价准则、权重、频率和其他统计单位,比如日均、月均、人均、季度、年度等量词;The four types of keyword groups specifically refer to: ① Service content, including service providers (personnel roles, system tools, software applications, etc.), service carriers (commodities, orders, knowledge, data, etc.) and service execution environment and context, generally Noun phrases; ②Business activities, including specific implementation behaviors of service providers and detailed disposal methods of service carriers, generally verb phrases; ③Evaluation aspects, including service content and business activities modifiers, generally represented by XX rate|proportion|account ratio, XX effect|degree, XX size|speed|load, etc.; ④Evaluation rules, including index evaluation criteria, weight, frequency and other statistical units, such as quantifiers such as daily average, monthly average, per capita, quarterly, and annual;
步骤二、根据公共词典、领域词典和自建词典,分别计算两两指标四类关键词组之间的语素关系,得到指标之间的语义相似度矩阵,其中:Step 2: According to the public dictionary, the domain dictionary and the self-built dictionary, calculate the morpheme relationship between the four key groups of the two indicators, and obtain the semantic similarity matrix between the indicators, where:
所述公共词典包括同义词词林(扩展版)、HowNet词典、百度汉语词典;Described public dictionary includes synonym word forest (extended version), HowNet dictionary, Baidu Chinese dictionary;
所述领域词典包括搜狗行业词库、百度行业词库,包括概念标识、概念名、同义词、英文名、语义描述、应用领域六个表项,是领域专 家根据对领域的理解和经验而建立的特定领域概念列表;The domain dictionary includes Sogou industry thesaurus and Baidu industry thesaurus, including six entries: concept identifier, concept name, synonym, English name, semantic description, and application field. It is established by field experts based on their understanding and experience of the field. A list of domain-specific concepts;
所述自建词典中词组的定义内容包括ID、词组、词性、所述类别(服务内容、业务活动、指标评价侧面、指标评价规则四者之一)、近义词、反义词、同类词、上位词、下位词、因果相关词组、所属/来源相关词组、使用/工具相关词组、组成/总分相关词组、执行依赖相关词组中的几种;The definition content of the phrase in the self-built dictionary includes ID, phrase, part of speech, the category (one of service content, business activity, index evaluation side, index evaluation rule), synonyms, antonyms, similar words, hypernyms, Hyponymy, causal-related phrases, belonging/source-related phrases, usage/tool-related phrases, composition/total score-related phrases, and execution-dependent-related phrases;
所述语素关系包括相似(高度相似)、相近(比相似的近似程度弱)、相关、同类四种;The morpheme relationship includes four types: similar (highly similar), similar (weaker than similar), related, and similar;
所述语义相似度矩阵是一个二维矩阵,分别是两个指标的四类关键词组集;The semantic similarity matrix is a two-dimensional matrix, which are four types of keyword group sets of two indicators;
步骤三、借助语义相似度矩阵判定指标之间的语义关系,并计算关系置信度,其中:Step 3: Determine the semantic relationship between the indicators with the help of the semantic similarity matrix, and calculate the relationship confidence, where:
所述语义关系包括相似关系(①同一指标;②共轭指标;③上下级指标;)、相关关系(④服务内容相关;⑤业务相关;⑥指标相关)、同类指标(⑦同类服务评价侧面;⑧同类业务;⑨同类服务内容);The semantic relationship includes similarity relationship (①same index; ② conjugate index; ③ subordinate index; ), related relationship (④ service content related; ⑧ Similar business; ⑨ Similar service content);
步骤四、按照步骤三判定所有指标的语义关系得到语义关系网,根据指标之间语义关系的方向和数量删除冗余的边,简化语义网,其中:Step 4: Determine the semantic relationship of all indicators according to Step 3 to obtain a semantic relationship network, delete redundant edges according to the direction and quantity of the semantic relationship between the indicators, and simplify the semantic network, wherein:
所述语义关系网指以指标为节点、以指标间语义关系为边的网,边属性为语义关系类型和置信度,边方向包括有向和无向两种,语义关系中③上下级指标和⑤业务相关是有方向的;The semantic relationship network refers to a network with indicators as nodes and semantic relationships between indicators as edges. The edge attributes are the semantic relationship type and confidence, and the direction of the edge includes two kinds of directed and undirected. ⑤ Business-related is directional;
步骤五、根据指标在不同时空界下的样本数据拟合指标在单域和 富域上的分布特征,其中: Step 5. Fit the distribution characteristics of the indicator in the single domain and the rich domain according to the sample data of the indicator in different space-time boundaries, where:
所述时是指不同时间域,空是指不同的地理域,界是指不同的服务实施环境(线上或线下)、不同的服务实施平台或者不同的服务参与者;Time refers to different time domains, space refers to different geographic domains, and boundary refers to different service implementation environments (online or offline), different service implementation platforms or different service participants;
所述单域分布特征是指指标在一个服务域上的概率分布特征,富域分布特征是指指标在两个及以上服务域上的概率分布特征;The single domain distribution feature refers to the probability distribution feature of the indicator in one service domain, and the rich domain distribution feature refers to the probability distribution feature of the indicator in two or more service domains;
步骤六、以概率分位数为参考建立指标量化方式上的对齐关系,其中:Step 6: Establish an alignment relationship in the way of index quantification with the probability quantile as a reference, in which:
所述指标量化方式上的对齐关系是指求解某一类服务等级在不同时空界特征下对应的指标取值范围,或判定指标取值在特定时空界下的对应的服务等级。The alignment relationship in the index quantification method refers to finding the corresponding index value range of a certain type of service level under different space-time boundary characteristics, or determining the corresponding service level of the index value under a specific space-time boundary.
相比于现有技术,本发明具有如下优点:Compared with the prior art, the present invention has the following advantages:
与传统基于本体的企业模型语义互操作方法不同,本发明并不依赖于本体的建设,而是利用自然语言处理常用的手段抽取指标定义和解释的语句中包含的关键词汇,借助公共词典和领域词典中包含的词汇信息和语素关系挖掘不同指标间的相关关系。在量化方式对齐方面,本发明总结了多参与者在协作过程中导致量化方式不一致的因素,并从时空界的角度考虑多维服务实现环境下指标具体取值与实际要表达的服务等级之间的映射关系,实现指标量化方式对齐。Different from the traditional ontology-based enterprise model semantic interoperability method, the present invention does not depend on the construction of ontology, but uses common methods of natural language processing to extract key words contained in the sentences defined and explained by indicators, and uses public dictionaries and domain The lexical information and morpheme relationships contained in the dictionary are used to mine the correlation between different indicators. In terms of the alignment of quantification methods, the present invention summarizes the factors that lead to inconsistent quantification methods in the process of collaboration among multiple participants, and considers the relationship between the specific value of the indicator in the multi-dimensional service implementation environment and the actual service level to be expressed from the perspective of space and time. Mapping relationship, to achieve the alignment of index quantification.
附图说明Description of drawings
图1为本发明的面向时空界的多方服务价值-质量-能力指标对齐方法框架;Fig. 1 is the multi-party service value-quality-capability index alignment method framework oriented to the space-time boundary of the present invention;
图2为本发明的面向领域特征的多参与者服务价值-质量-能力指标语义对齐的方法框架;Fig. 2 is the method framework of the multi-participant service value-quality-capability index semantic alignment oriented to domain features of the present invention;
图3为本发明的面向时空特征的多参与者服务价值-质量-能力指标量化方式对齐的方法框架;Fig. 3 is the method framework of the multi-participant service value-quality-capability index quantification method alignment oriented to spatiotemporal features of the present invention;
图4为本发明语义对齐阶段指标关系判定的原则;Fig. 4 is the principle of index relation judgment in the semantic alignment stage of the present invention;
图5为本发明的面向领域特征的服务评价指标关键词解析示例图;FIG. 5 is an example diagram of the keyword analysis of the domain feature-oriented service evaluation index of the present invention;
图6为本发明的面向领域特征的多参与者服务评价指标语义对齐示意图;6 is a schematic diagram of semantic alignment of domain feature-oriented multi-participant service evaluation indicators of the present invention;
图7为本发明的面向时空特征的指标单域分布特征示例图;FIG. 7 is an example diagram of a single-domain distribution feature of indicators oriented to spatiotemporal features of the present invention;
图8为本发明的面向时空特征的指标富域分布特征示例图;FIG. 8 is an exemplary diagram of a spatiotemporal feature-oriented index rich domain distribution feature of the present invention;
图9为本发明的面向时空特征的多参与者服务评价指标量化方式对齐理论图。FIG. 9 is a theoretical diagram of alignment of the spatiotemporal feature-oriented multi-participant service evaluation index quantification method according to the present invention.
具体实施方式detailed description
下面结合附图对本发明的技术方案作进一步的说明,但并不局限于此,凡是对本发明技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,均应涵盖在本发明的保护范围中。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings, but are not limited thereto. Any modification or equivalent replacement of the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention shall be included in the present invention. within the scope of protection.
本发明提供了一种面向时空界的多方服务价值-质量-能力指标对齐方法,该方法分为两部分:面向领域特征的多参与者服务评价指标语义对齐和面向时空界特征的多参与者服务评价指标量化方式对齐,其框架如图1-3所示。The invention provides a multi-party service value-quality-capability index alignment method oriented to the space-time boundary. The method is divided into two parts: the semantic alignment of the multi-participant service evaluation index oriented to the domain characteristics and the multi-participant service oriented to the characteristics of the time-space boundary. The quantification methods of evaluation indicators are aligned, and the framework is shown in Figure 1-3.
本发明语义对齐的目的是在已知多领域多参与者服务价值-质量- 能力评价指标体系的前提下,通过自然语言处理相关技术抽取指标关键要素,而后借助公共词典、领域词典和自建词典计算四类词组之间的语义关系,最后在词汇关系矩阵的基础上判定指标语义关系并计算关系置信度,最终得到多领域多参与者指标语义关系网。每个参与者可以从语义关系网中获知己方服务指标与他方指标间的关联关系,这种关系并不局限于同名不同义或同义不同名的情况,还可以挖掘到更丰富的语义关系。The purpose of semantic alignment of the present invention is to extract key elements of indicators through natural language processing related technologies on the premise of knowing the multi-domain and multi-participant service value-quality-capability evaluation index system, and then calculate with the help of public dictionaries, domain dictionaries and self-built dictionaries. The semantic relationship between the four types of phrases is finally determined on the basis of the lexical relationship matrix and the relationship confidence is calculated. Finally, the multi-domain and multi-participant index semantic relationship network is obtained. Each participant can learn the relationship between its own service index and other party's index from the semantic relationship network. This relationship is not limited to the situation of the same name but different names or different names, and can also mine richer semantic relationships.
原始的指标定义包括指标名称、缩略语/习语、英文简写、指标解释、优越方向、量纲(单位+数量级)、取值范围、计算公式等内容,其中缩略语/习语和英文简写有很强的领域专业性,必须借助领域词典中包含的相关解释辅助理解;而指标名称与解释缺乏规范性,不同参与者的命名方式、解释详略度等均不一致;计算内容中也暗含指标相关关系。为了消除指标定义的不规范性,本发明第一步完成指标预处理工作,通过分词、词性标注、依存句法分析、词频统计等自然语言处理技术提取指标关键要素,剔除难于理解或服务评价无关的字词,得到[服务内容、业务活动、指标评价侧面、指标评价规则]四类词组。The original index definition includes index name, abbreviation/idiom, English abbreviation, index explanation, superior direction, dimension (unit + order of magnitude), value range, calculation formula, etc. The abbreviation/idiom and English abbreviation include: Strong domain expertise, it is necessary to use the relevant explanations contained in the domain dictionary to assist understanding; the index names and explanations lack normative, and the naming methods and explanation details of different participants are inconsistent; the calculation content also implies index related relation. In order to eliminate the irregularity of the index definition, the present invention completes the index preprocessing in the first step, extracts the key elements of the index through natural language processing technologies such as word segmentation, part-of-speech tagging, dependency syntax analysis, word frequency statistics, etc., and eliminates those that are difficult to understand or irrelevant to service evaluation. Words, get [service content, business activities, index evaluation side, index evaluation rules] four types of phrases.
服务内容:包括参与服务实施的人员角色、服务执行依赖的资源、服务交付过程中伴随的有形产品或有价值的知识信息等,一般由专有名词表示。Service content: It includes the roles of personnel involved in service implementation, the resources that service execution depends on, tangible products or valuable knowledge information accompanying the service delivery process, etc., generally represented by proper nouns.
业务活动:与业务执行相关的动词,指人员角色或自动化机械系统实施的动作,一般由动词表示。Business activities: verbs related to business execution, referring to actions performed by human roles or automated mechanical systems, generally represented by verbs.
指标评价侧面:形容修饰服务内容或业务活动的名词,一般有特 定的后缀,比如XX率、XX程度、XX效果、XX性。Indicator evaluation side: describe the nouns that modify service content or business activities, generally with specific suffixes, such as XX rate, XX degree, XX effect, XX nature.
指标评价规则:评价指标有特定的评估频率和对象,比如日均、月均、年均;或者每人次、每单、每宗。Indicator evaluation rules: The evaluation indicators have specific evaluation frequency and objects, such as daily average, monthly average, annual average; or per person, per order, per case.
本发明指标关系的主要判定依据是公共词典、领域词典和自建词典三类词典,词典中词汇丰富度、词汇关系详细度、词汇解释详细度、词汇组织结构的有无等都会影响到计算结果的可靠性。因此,本发明选定同义词词林(扩展版)、HowNet词典、百度汉语词典为可参考的公共词典;搜狗行业词库、百度行业词库为可参考的领域词典;自建词典包含ID、词组、词性、所述类别(服务内容、业务活动、评价侧面、评价规则四者之一)、近义词、反义词、同类词、上位词、下位词、因果相关词组、所属/来源相关词组、使用/工具相关词组、组成/总分相关词组、执行依赖相关词组等属性。然后综合利用上述词典信息,计算四类词组之间的关系。The main judging basis of the index relationship of the present invention is three types of dictionaries: public dictionaries, domain dictionaries and self-built dictionaries. The lexical richness, lexical relationship detail, lexical explanation detail, and lexical organization structure in the dictionaries will affect the calculation result. reliability. Therefore, the present invention selects the synonym Cilin (extended version), HowNet dictionary, and Baidu Chinese dictionary as public dictionaries that can be referred to; Sogou industry thesaurus and Baidu industry thesaurus are domain dictionaries that can be referred to; the self-built dictionary contains ID, phrase , part of speech, described category (one of the four of service content, business activity, evaluation side, evaluation rules), synonyms, antonyms, similar words, hypernyms, hyponyms, causally related phrases, belonging/source related phrases, usage/tools Related phrases, composition/total score related phrases, execution-dependent related phrases, etc. Then comprehensively use the above dictionary information to calculate the relationship between the four types of phrases.
本发明针对指标间语义层面上的相关关系,定义了三大类九小类关系,其中:九类关系解释如下:The present invention defines three major categories and nine sub-categories for the correlation between indexes on the semantic level, wherein: the nine categories of relations are explained as follows:
一、相似关系1. Similar relationship
1、同一指标:指服务内容、业务活动、指标评价侧面和修饰词都能对应上,且均语义高度相似。eg.菜品打包速率、菜品打包效率。1. The same indicator: It means that the service content, business activities, indicator evaluation aspects and modifiers can all correspond, and all have highly similar semantics. eg. Food packaging rate, food packaging efficiency.
2、共轭指标:指服务内容、业务活动高度相似,但指标评价侧面互为反义词。eg.餐厅卫生干净程度,就餐环境脏乱程度。2. Conjugate index: It means that the service content and business activities are highly similar, but the evaluation aspects of the index are antonyms to each other. eg. The cleanliness of the restaurant and the degree of clutter in the dining environment.
3、上下级指标:3. Subordinate and subordinate indicators:
指业务活动和指标评价侧面高度相似,但服务内容之间存在上下 级关系(词A是词B的组成成分,或词A是词B的一种子类)。eg.商品次品率,生鲜次品率。It means that business activities and index evaluation are highly similar, but there is a subordinate relationship between service contents (word A is a component of word B, or word A is a subcategory of word B). eg. Commodity defective rate, fresh defective rate.
二、相关关系2. Relevant relationship
4、服务内容相关:指业务活动相似(都存在的话)、指标评价侧面相近(比相似的近似程度弱),服务内容间存在某种相关关系,比如厨师健康状况和菜品卫生,菜品由厨师制作,健康和卫生相近。4. Relevance of service content: refers to similar business activities (if both exist), similar aspects of index evaluation (weaker than similar approximation), and there is a certain correlation between service content, such as the health status of the chef and the hygiene of the dishes, and the dishes are made by the chef. , health and hygiene are similar.
5、业务相关:指服务内容相似、指标评价侧面相近,业务活动间存在某种相关关系,比如菜品打包牢固度和菜品运输无损度,因为打包是运输的前序活动,牢固度和无损度相近。5. Business-related: Refers to similar service content, similar indicators and evaluation aspects, and there is a certain correlation between business activities, such as the firmness of food packaging and the degree of non-destructiveness of food transportation, because packaging is a pre-order activity of transportation, and the degree of firmness and non-destructiveness are similar .
6、指标相关:指服务内容和业务活动均不存在明显的相关关系,但是指标描述中包含“随着XXX”、“越XX越XX”等伴随词时,表明二者指标间存在相关关系,如果变化趋势一直则为正相关;否则为负相关。比如菜品配送时长与菜品质保程度,很明显送餐时间越长,菜品的质保程度越差,二者存在负相关。6. Indicator correlation: It means that there is no obvious correlation between service content and business activities, but when the indicator description contains accompanying words such as "with XXX" and "more XX, more XX", it indicates that there is a correlation between the two indicators. If the change trend is consistent, it is positive correlation; otherwise, it is negative correlation. For example, the delivery time of dishes is negatively correlated with the degree of quality assurance of the dishes. Obviously, the longer the delivery time, the worse the quality assurance of the dishes.
三、同类指标3. Similar indicators
7、同类指标/服务评价侧面:指服务评价侧面相似,但是服务内容和业务活动既不相似也不相关,或没有提取出服务内容和业务活动,这种情况可以粗糙地定义同类关系。eg.菜品打包准确率,订单核算准确率。7. Similar indicators/service evaluation aspects: Refers to the similar service evaluation aspects, but the service content and business activities are neither similar nor related, or the service content and business activities are not extracted. In this case, a similar relationship can be roughly defined. eg. Dishes packaging accuracy, order accounting accuracy.
8、同类业务:是指业务活动相似,但服务内容和评价侧面既不相似也不相关。eg.菜品打包准确率,菜品打包牢固度。8. Similar business: refers to similar business activities, but the service content and evaluation aspects are neither similar nor related. eg. The accuracy of food packaging and the firmness of food packaging.
9、同类服务内容:是指服务内容相似,但业务活动和评价侧面 既不相似也不相关。eg.商品存储时长,精加工商品占比。9. Similar service content: Refers to the similar service content, but the business activities and evaluation aspects are neither similar nor related. eg. Commodity storage time, the proportion of finishing commodities.
上述九类关系的紧密性依次降低。可能导致关系误判的原因体现在:①指标定义包含的有效信息缺失;②或训练语料有限覆盖率不高,导致错误的词义理解或词与词关系确定。对于在语义对其阶段未能自动建立相关关系的指标或本身无关的指标被建立起相关关系,后三类指标关系是考察的重点,一方面可以调低相关性判定的置信度或丰富指标解释内容,提高指标关系判定的准确度;另一方面可以关注这方面指标关系做人为增删。优化后得到一个语义关系网,其中节点信息表示一个指标,边信息表示语义关系和关系置信度。The tightness of the above-mentioned nine types of relationships decreases sequentially. The reasons that may lead to misjudgment of the relationship are as follows: (1) the effective information contained in the index definition is missing; (2) the limited coverage of the training corpus is not high, which leads to the wrong understanding of the word meaning or the determination of the relationship between words and words. For the indicators that fail to automatically establish a correlation in the semantic-to-correlation stage or the indicators that have nothing to do with themselves are established, the last three types of indicator relationships are the focus of the investigation. On the one hand, the confidence level of the correlation determination can be lowered or the explanation of the indicators can be enriched. content, and improve the accuracy of index relationship determination; After optimization, a semantic relation network is obtained, in which node information represents an index, and edge information represents semantic relation and relation confidence.
本发明的面向领域特征的多参与者服务价值-质量-能力评价指标语义对齐的方法的具体实施步骤如下:The specific implementation steps of the method for semantic alignment of multi-participant service value-quality-capability evaluation indicators oriented to domain features of the present invention are as follows:
步骤一、评价指标预处理 Step 1. Evaluation index preprocessing
通过对指标内容的统计性分析,发现通过服务内容、业务活动、指标评价侧面以及指标评价规则即可确定指标评价对象、关注点以及评价范围等丰富的信息,因此预处理阶段主要工作是提取指标中包含的这四类信息的关键词组。为什么不是四个单词,而是词组的原因是,有些指标解释内容中可能含有“比如XX”、“包含XX”、“XX等等”这样的词汇。Through the statistical analysis of the index content, it is found that rich information such as index evaluation objects, concerns and evaluation scope can be determined through service content, business activities, index evaluation aspects and index evaluation rules. Therefore, the main work of the preprocessing stage is to extract indicators The keyword groups of these four types of information contained in . The reason why it is not four words but phrases is that some indicators may contain words such as "such as XX", "including XX", "XX, etc." in the content of the indicator explanation.
预处理阶段输入的是某一指标定义和解释的语句S i,分词的目的是从语句中提取属于上述四类关键词的所有词并去掉不必要的停用词得到WG(WG表示关键词汇的集合),词性标注阶段可以从WG中识别出名词、动词、量词、副词、形容词、连词等包含实际语义的重 要单词,分别对应得到服务内容词组WG services、业务活动词组WG business、指标评价侧面词组WG indicators、修饰词词组WG adjunctword。依存句法分析阶段可以得不同词性单词之间的依存/修饰关系,综合所有评价指标的分析结果,可以总结出如下四类词汇间的关联关系:①某一服务内容有哪些相关的业务动作;②某一业务活动可以由哪些实施者,又有哪些受体;③某一服务内容或业务活动有哪些特定的评价侧面;④哪些评价侧面是公共的(多数服务内容或业务活动都会考虑到的)。除此之外,依存句法分析也可以明确连词相关的并列词汇,可以进一步删减不重要的词汇。 The input in the preprocessing stage is a sentence S i defined and explained by an indicator. The purpose of word segmentation is to extract all the words belonging to the above four categories of keywords from the sentence and remove unnecessary stop words to obtain WG (WG represents the number of key words). In the part-of-speech tagging stage, important words containing actual semantics, such as nouns, verbs, quantifiers, adverbs, adjectives, conjunctions, etc., can be identified from WG, and corresponding to the service content phrase WG services , business activity phrase WG business , and indicator evaluation side phrases WG indicators , modifier phrase WG adjunctword . The dependency/modification relationship between words of different parts of speech can be obtained at the stage of dependency syntax analysis. By synthesizing the analysis results of all evaluation indicators, the following four types of association relationships can be summarized: ①What are the related business actions of a certain service content; ② Who are the implementers of a business activity and who are the recipients; ③ What are the specific evaluation aspects of a service content or business activity; ④ Which evaluation aspects are public (most service content or business activities will be considered) . In addition, dependency syntactic analysis can also clarify the co-ordinated words related to conjunctions, and can further delete unimportant words.
以上预处理工作都可以依靠类似StanfordNLPCore等自然语言处理工具包和公共大语料训练的语言模型完成。以翻台率为例,指标原始定义如下:[翻台率;一个饭店一天内每张桌子平均使用次数,翻台率是衡量餐厅盈利的重要指标,与餐厅日均客流量密切相关;(餐桌使用次数-总台位数)÷总台位数]。经预处理后得到的四类词组如下:The above preprocessing work can be completed by relying on natural language processing toolkits such as StanfordNLPCore and language models trained on public large corpora. Taking the table turnover rate as an example, the original definition of the indicator is as follows: [Table turnover rate; the average number of times each table is used in a hotel in a day, the table turnover rate is an important indicator to measure the profitability of a restaurant and is closely related to the average daily passenger flow of the restaurant; (table turnover rate) The number of times of use - the total number of units) ÷ the total number of units]. The four types of phrases obtained after preprocessing are as follows:
WG services={饭店,桌子,餐厅,餐桌}; WG services = {restaurant, table, dining room, table};
WG business={使用|2,盈利}; WG business = {use|2, profit};
WG indicators={次数,总台位数,客流量}; WG indicators = {number of times, total number of stations, passenger flow};
WG adjunctword={一天,每张,日均}。 WG adjunctword = {one day, per sheet, daily average}.
单凭上述工作,实验中发现部分指标解析到的词汇数量依旧很 多,这会给后续指标关系判定带来很大计算量,因此可以通过一些规则进一步约简词组,本发明采用ID-IDF方法定量分析每个单词的重要性,对不重要的词汇做删减,同时这一重要性也会参与到后续指标关系判定中。计算公式如下所示:Based on the above work alone, it is found in the experiment that the number of words parsed by some indicators is still very large, which will bring a lot of calculation to the subsequent indicator relationship determination. Therefore, some rules can be used to further simplify the phrases. The present invention uses the ID-IDF method to quantify The importance of each word is analyzed, and the unimportant words are deleted. At the same time, this importance will also be involved in the subsequent indicator relationship determination. The calculation formula is as follows:
Figure PCTCN2021089373-appb-000001
Figure PCTCN2021089373-appb-000001
Figure PCTCN2021089373-appb-000002
Figure PCTCN2021089373-appb-000002
tf-idf i,j=tf i,j×idf itf-idf i,j =tf i,j ×idf i ;
其中,n i,j为某一指标j中特定单词i出现的总次数,n k,j为该指标j中其他单词k出现的总次数, |D |表示所有指标数,|j:t i∈d j|表示包含单词t i的指标数,tf i,j表示该词在这一指标解释中的重要度,idf i表示该词在该指标解释中的专有程度。 Among them, n i,j is the total number of occurrences of a specific word i in an indicator j, n k,j is the total number of occurrences of other words k in the indicator j, | D | represents the number of all indicators, |j:t i ∈d j | denotes the number of indices containing the word t i , tf i,j denotes the importance of the word in the explanation of this index, and idf i denotes the degree of exclusiveness of the word in the explanation of the index.
步骤二、自定义其他输入 Step 2. Customize other inputs
指标相关性的判定直接受词汇语义关联影响,现有的开放词典部分满足了这方面的需求,但大多数只包含了上下位关系、近义关系、反义关系、同类关系等,其他复杂的相关关系尚未包含,本发明总结了服务评价指标常见的词汇语义关联,但是没有很优秀的方法从公共领域准确提取这些语义关系,因此暂时通过粗糙的词汇语义关联词典和用户自建词典取代。The determination of index correlation is directly affected by lexical semantic association. The existing open dictionaries partially meet the needs in this regard, but most of them only include hyponymous relations, synonymous relations, antonymous relations, homogeneous relations, etc. The related relationship has not been included. The present invention summarizes the common lexical semantic relationships of service evaluation indicators, but there is no excellent method to accurately extract these semantic relationships from the public domain, so it is temporarily replaced by a rough lexical semantic relationship dictionary and a user-built dictionary.
服务内容之间存在的语义关联如下:The semantic associations that exist between service contents are as follows:
①上下位关系(a-kind-of):A是B的一种,A是B的下位词,B是A的上位词。比如“食材”和“肉制品”。① hyponym (a-kind-of): A is a kind of B, A is a hyponym of B, and B is a hypernym of A. Such as "ingredients" and "meat products".
②包含关系(a-part-of):A是B的一部分,B包含A,A是部分B是整体。比如“菜品”和“酒水”。②Inclusion relationship (a-part-of): A is a part of B, B contains A, A is a part and B is the whole. Such as "dishes" and "drinks".
③同类关系:A与B在树状上下位关系中有公共的抽象父类。比如“菜品”和“肉制品”。③Similar relationship: A and B have a common abstract parent class in the tree-like upper-lower relationship. Such as "dishes" and "meat products".
④相似关系(同一不同名):A与B表达的意思高度相似或等同。比如“超市”和“商场”。④ Similar relationship (same different names): The meanings expressed by A and B are highly similar or equivalent. Such as "supermarket" and "mall".
⑤相关关系⑤Relationship
来源相关:A是B的原材料,B由A加工而成。比如“菜品”和“食材”。Source related: A is the raw material of B, and B is processed by A. Such as "dishes" and "ingredients".
使用/工具相关:A是B相关业务的工具,比如“菜品”和“冷藏箱”。Use/tool related: A is a tool of B related business, such as "dish" and "refrigerator".
组成/总分相关:A是B必须包含的配件,比如“配送车”和“保温箱”。Composition/total score related: A is an accessory that B must include, such as "delivery cart" and "incubator".
业务活动之间存在的语义关联如下:The semantic associations that exist between business activities are as follows:
①时序依赖:A活动是B活动的前序活动,B活动是A活动的后继活动,比如“打包”和“配送”;① Timing dependency: A activity is the pre-order activity of B activity, and B activity is the successor activity of A activity, such as "packaging" and "delivery";
②同步依赖:A活动和B活动必须在同一时刻或地点实现同步,才能开始后续活动,否则必定有一方需要等待,比如“菜品打包完成” 和“骑手抵达餐厅”;②Synchronization dependency: Activity A and activity B must be synchronized at the same time or place to start subsequent activities, otherwise one party must wait, such as "dishes are packaged" and "rider arrives at the restaurant";
③补偿依赖:A活动的失误触发B活动的执行,若A活动无误则B活动不被执行,比如“确认收货”和“售后服务”。③ Compensation dependency: The error of A activity triggers the execution of B activity. If A activity is correct, B activity will not be executed, such as "confirmation of receipt" and "after-sales service".
指标评价侧面之间存在的语义关联如下:The semantic associations between the evaluation aspects of the indicators are as follows:
①近义关系:A与B表达同样或相似的概念,比如”正确率“和”准确率“;①Synonymous relationship: A and B express the same or similar concepts, such as "correct rate" and "accuracy rate";
②共轭关系:A与B表达相反的概念,比如“失误率”和“准确率”。②Conjugate relationship: A and B express opposite concepts, such as "error rate" and "accuracy rate".
指标评价规则之间存在的语义关联如下:The semantic associations between the index evaluation rules are as follows:
①转化关系:A与B属于同一类量词,则二者可以借助换算公式进行转化,比如“日均”和“月均”。①Conversion relationship: A and B belong to the same class of quantifiers, so they can be converted with the help of conversion formulas, such as "daily average" and "monthly average".
除此之外,因为不同服务参与者定义指标体系的准则不同,自建词典的质量也不同,因此为了确保指标自动对齐的置信度,允许开放若干可配置参数,确保已存在的指标关系不被丢失,不正确的指标关系不被挖掘。这里给到两种方案,一方面,指标体系构建者可以配置“相似判定阈值TH hs”、“相近判定阈值TH s”、“同类判定阈值TH ls”、“相关判定阈值TH r”(阈值取值范围均在0~1之间,相关判定阈值没有取值限制,其他三个阈值需要满足TH hs>TH s>TH ls),如果阈值较大,则可挖掘的指标关系数量较少、置信度较高;另一方面,可以配置“关系数下限”和“关系数上限”,在尽量确保关系数量的前提下,自动调整上述四个阈值大小。 In addition, because different service participants have different criteria for defining the index system, the quality of the self-built dictionary is also different. Therefore, in order to ensure the confidence of automatic index alignment, a number of configurable parameters are allowed to be opened to ensure that the existing index relationship is not used. Missing, incorrect metric relationships are not mined. Two solutions are given here. On the one hand, the index system builder can configure the "similar judgment threshold TH hs ", "similar judgment threshold TH s ", "similar judgment threshold TH ls ", and "related judgment threshold TH r " (thresholds take The value range is between 0 and 1. There is no value limit for the relevant judgment threshold. The other three thresholds need to satisfy TH hs > TH s > TH ls ). On the other hand, you can configure the "lower limit of relationship number" and "upper limit of relationship number", and automatically adjust the size of the above four thresholds on the premise of ensuring the number of relationships as much as possible.
综合上述语速关系,本发明将其表示为如下六类:Synthesizing the above-mentioned speech rate relationship, the present invention expresses it as the following six categories:
1、高度相似(HS):指单词间相似度计算值大于相似判定阈值TH hs1. High similarity (HS): the calculated value of similarity between words is greater than the similarity judgment threshold TH hs ;
2、互为反义(AN):指形容词词性的单词在词典中互为反义,或所表达的情感值加和近似于1;2. Antonym of each other (AN): refers to the words of the adjective part of speech that are antonyms to each other in the dictionary, or the sum of the sentiment values expressed is approximately 1;
3、互为近义(SY):指单词间相似度计算值小于相似判定阈值TH hs,但大于相近判定阈值TH s3. Mutual synonyms (SY): the calculated value of similarity between words is less than the similarity judgment threshold TH hs , but greater than the similarity judgment threshold TH s ;
4、上下位关系(LS):指名词词性的单词在词典中具有上下位关系;4. Hyponymy relationship (LS): refers to the noun part-of-speech word that has a hyponymous relationship in the dictionary;
5、相关关系(RE):指单词在词典中具有相关关系(服务内容之间、业务活动之间均有语义相关性);5. Relevance relationship (RE): refers to the relationship between words in the dictionary (there are semantic correlations between service contents and business activities);
6、NULL:指既不存在高度相似关系,也不存在相关关系;或者该类别单词在一方指标定义中不存在。6. NULL: means that there is neither a highly similar relationship nor a related relationship; or the category of words does not exist in the definition of one indicator.
以上语义关系的判定,可以借助单词在词典中的位置、编号、标识符以及词典结构计算可得。The determination of the above semantic relationship can be obtained by calculating the position, number, identifier and dictionary structure of the word in the dictionary.
步骤三、指标关系判定Step 3: Determining the relationship between indicators
首先借助开放公共词典判定四类词汇之间的关系,本发明实验中采纳的是同义词林、HowNet和百度汉语词典,其中包含了词频、词性、近义词、上位词、词编码、相关词等信息,此外用户还可以自建 词典予以补充。假设所有的指标集为I,其中某一条评价指标为I n,经预处理后得到四个词组
Figure PCTCN2021089373-appb-000003
要判定两个指标I n,I m之间是否存在某种语义关系,首先计算同类词组
Figure PCTCN2021089373-appb-000004
k∈{services,bu sin ess,indicators,adjunctword}之间存在的语义关联。如下所示,同类词组间关系计算可以用一个矩阵
Figure PCTCN2021089373-appb-000005
表示:
First, the relationship between the four types of words is determined with the help of an open public dictionary. The synonym forest, HowNet and Baidu Chinese dictionary are adopted in the experiment of the present invention, which contains information such as word frequency, part of speech, synonyms, hypernyms, word codes, and related words. In addition, users can also build their own dictionaries to supplement. Assuming that all index sets are I, and one of the evaluation indexes is I n , four phrases are obtained after preprocessing
Figure PCTCN2021089373-appb-000003
To determine whether there is a certain semantic relationship between the two indicators In, Im , first calculate the same type of phrases
Figure PCTCN2021089373-appb-000004
The semantic association that exists between k ∈ {services,bu sin ess,indicators,adjunctword}. As shown below, the relationship between homogeneous phrases can be calculated using a matrix
Figure PCTCN2021089373-appb-000005
Express:
Figure PCTCN2021089373-appb-000006
Figure PCTCN2021089373-appb-000006
其中,指标I n包含p个单词,指标I m包含q个单词,每个单词都有对应的IF-IDF值,矩阵大小为p×q。矩阵中每个元素a i,j是一个二元组<RelarionType,Confidence>包括单词之间关系类型和置信度,其中RelationType∈{HS,AN,SY,LS,RE,NULL}并且Confidence∈[0,1]。 Among them, the index In contains p words, the index Im contains q words, each word has a corresponding IF-IDF value, and the matrix size is p×q. Each element a i,j in the matrix is a two-tuple <RelarionType, Confidence> including the relation type and confidence between words, where RelationType∈{HS,AN,SY,LS,RE,NULL} and Confidence∈[0 ,1].
下面需要计算每一类单词语义关联R r的支持度,如下式所示,对所有a i,j.RelationType=R r的a i,j对应的w i,w j的IF-IDF值乘积加和,即为类型R r的支持度。 Next, it is necessary to calculate the support degree of each type of word semantic association R r , as shown in the following formula, for all a i,j .RelationType=R r of a i,j corresponding to the IF-IDF value product of w i ,w j and is the support of type R r .
Figure PCTCN2021089373-appb-000007
Figure PCTCN2021089373-appb-000007
取SD r最大值对应的r Max就是该类词组的语义关联类型,这一语义关联的置信度
Figure PCTCN2021089373-appb-000008
为矩阵中同类型所有元素置信度的均值(也可以采纳其他统计量)。
The r Max corresponding to the maximum value of SD r is the semantic association type of this type of phrase, and the confidence level of this semantic association
Figure PCTCN2021089373-appb-000008
is the mean of the confidences of all elements of the same type in the matrix (other statistics can also be adopted).
Figure PCTCN2021089373-appb-000009
Figure PCTCN2021089373-appb-000009
其中,n、m分别表示指标I n和指标I m,k是指四类关键词组,num是指
Figure PCTCN2021089373-appb-000010
的词的数量。
Among them, n and m represent the index I n and the index I m respectively, k refers to the four types of keyword groups, and num refers to the
Figure PCTCN2021089373-appb-000010
number of words.
在得到四类词组的关系后,需要在此基础上判定指标间的语义关系,判定依据如图4所示。特别的,如果是相关关系的判定,需要比较指标语义置信度计算值和同类判定阈值TH ls,如果大于这一阈值才能判定为存在同类关系,否则二者无关。这样做的目的是,同类关系计算中只有一类词组的置信度值较高,其他两类词组的置信度可高可低不确定,为了确保充分发现同类关系,同时避免关系判定失误,因此需要这一比较。而对于其他六类指标语义关系,三类词组的置信度都不会过低,不会存在这一问题。 After obtaining the relationship of the four types of phrases, it is necessary to determine the semantic relationship between the indicators on this basis, and the judgment basis is shown in Figure 4. In particular, if it is a correlation determination, it is necessary to compare the index semantic confidence calculation value and the similar determination threshold TH ls . If it is greater than this threshold, it can be determined that there is a similar relationship, otherwise the two are irrelevant. The purpose of this is that only one type of phrase has a high confidence value in the calculation of the same type of relationship, and the confidence value of the other two types of phrases can be high or low. This comparison. As for the other six types of semantic relationships, the confidence of the three types of phrases will not be too low, and this problem will not exist.
步骤四、指标关系优化Step 4: Optimize the relationship between indicators
为了定量分析该技术框架得到的语义对齐结果的效果,本发明定义了如下评估指标:In order to quantitatively analyze the effect of the semantic alignment results obtained by the technical framework, the present invention defines the following evaluation indicators:
1、最大结点入度1. Maximum node in-degree
结点的入度,表示该结点在综合指标评价体系中的依赖度,意味着很多相关变量或指标将决定或影响该指标的取值,如果最大结点入度越大,意味着指标体系结构层次较浅、容错率较低、错误传播概率也较低。The in-degree of a node indicates the degree of dependence of the node in the comprehensive index evaluation system, which means that many related variables or indicators will determine or affect the value of the index. If the maximum in-degree of the node is larger, it means that the index system The structure level is shallower, the fault tolerance rate is lower, and the error propagation probability is also lower.
2、最大结点出度2. Maximum node out degree
结点的出度,表示该结点在综合指标评价指标体系中的重要度, 意味着该指标可以决定或影响多个指标的取值,如果最大结点出度越大,意味着指标体系结构越复杂、不稳定、更容易发生牵一发而动全身的问题。The out-degree of a node indicates the importance of the node in the comprehensive index evaluation index system, which means that the index can determine or affect the value of multiple indicators. If the maximum node out-degree is larger, it means that the index system structure The more complex and unstable it is, the more likely it is to cause problems that affect the whole body.
3、覆盖度3. Coverage
指通过语义对齐,与其他指标建立起关联的指标占全部指标数的比例。覆盖度越高,意味着指标关联越密切,指标语义关系挖掘越丰富;反之,意味着孤立指标数较多,模型未知性较高,因为一个成体系的服务评价指标,理论上不存在不受其他指标影响的孤立指标。v i表示指标语义关系网中第i个节点,以O(v i)表示该指标的出度,以I(v i)表示该指标的入度,以Λ k("Condition")表示某一元素符合某种条件的指标数量。则覆盖度计算公式如下所示: Refers to the proportion of indicators associated with other indicators to the total number of indicators through semantic alignment. The higher the coverage, the more closely the indicators are related, and the richer the semantic relationship of indicators is; on the contrary, it means that the number of isolated indicators is large, and the model is more unknown, because a systematic service evaluation indicator does not exist in theory. Isolated metrics that are influenced by other metrics. v i represents the ith node in the index semantic relational network, O(vi ) represents the out-degree of the indicator, I(vi ) represents the in-degree of the indicator, and Λ k ( "Condition") represents a certain The number of metrics for which an element meets a certain condition. The coverage calculation formula is as follows:
Figure PCTCN2021089373-appb-000011
Figure PCTCN2021089373-appb-000011
4、命中率4. Hit rate
因为在服务价值-质量-能力建模阶段,我们同样允许用户人为定义指标关系及关系类型,以此为确定集Set certain,那么命中率是指通过以上方法挖掘的指标语义关系中包含确定集中指标的数量占比,其中,e j表示指标语义关系网中第j条边,以Λ e("Condition")表示某一元素符合某种条件的指标数量。 Because in the service value-quality-capability modeling stage, we also allow the user to define the index relationship and the relationship type manually, and use this as the deterministic set Set certain , then the hit rate means that the index semantic relationship mined by the above method includes the deterministic centralized index Among them, e j represents the jth edge in the indicator semantic relation network, and Λ e ("Condition") represents the number of indicators that an element meets a certain condition.
Figure PCTCN2021089373-appb-000012
Figure PCTCN2021089373-appb-000012
5、错误率5. Error rate
指通过以上方法挖掘的指标语义关系指标的类型判定失误或者将人为判定完全无关的指标建立对齐关系的指标占比。It refers to the proportion of indicators that have misjudged the types of the indicators of the semantic relationship of indicators mined by the above methods or established an alignment relationship with indicators that are completely irrelevant by human judgment.
Figure PCTCN2021089373-appb-000013
Figure PCTCN2021089373-appb-000013
6、新颖性6. Novelty
指通过以上方法挖掘的指标语义关系指标不属于建模阶段人为定义指标关系且指标判定关系正确的数量占比。Refers to the proportion of the index semantic relationship index mined by the above methods that does not belong to the artificially defined index relationship in the modeling stage and the index judgment relationship is correct.
Figure PCTCN2021089373-appb-000014
Figure PCTCN2021089373-appb-000014
7、每类语义关系类型发现的次数和平均置信度7. The number and average confidence of each semantic relation type found
这一步只是为了详细分析上述方法的对齐效果,如果相似类指标关系占比较高,说明指标评价体系冗余度高;如果相关类指标关系占比较高,说明指标评价体系关联度比较密切;如果同类指标关系占比较高,意味着指标体系较为详细。This step is only to analyze the alignment effect of the above methods in detail. If the relationship between similar indicators is high, it means that the index evaluation system has high redundancy; The high proportion of index relationship means that the index system is more detailed.
本方法对词典和单词语义关联判定阈值依赖较高,因此人为拟定的初始输入,得到的指标语义对齐的结果可能会存在关系挖掘不充分或关系挖掘错误的情况。上述对齐结果评估中提到的命中率、错误率、创新度都与覆盖度成正比。指标关系挖掘越丰富,命中率越高、创新度越高,同时错误率也会越高。因此控制指标关系挖掘的数量是优化的一个入手点。因此可以通过重置语义关系判定置信度的方式优化。This method is highly dependent on the lexicon and word semantic association judgment threshold, so the result of the index semantic alignment obtained by the artificial initial input may have insufficient relationship mining or relationship mining error. The hit rate, error rate, and innovation degree mentioned in the above alignment result evaluation are all proportional to coverage. The richer the index relationship mining, the higher the hit rate, the higher the innovation, and the higher the error rate. Therefore, controlling the number of index relationship mining is a starting point for optimization. Therefore, it can be optimized by resetting the confidence level of semantic relationship determination.
另一方面,指标内容的丰富度也会影响指标关系的判定,如果指标内容过于简练(服务内容、业务活动、评价侧面描述不全),常常容易被分类到同类指标关系中。因此如果同类指标关系占比较高且错误率较高,可以通过补充指标解释内容优化。On the other hand, the richness of the index content will also affect the determination of the index relationship. If the index content is too concise (the description of service content, business activities, and evaluation aspects is incomplete), it is often easy to be classified into the same index relationship. Therefore, if the relationship between similar indicators is high and the error rate is high, the content optimization can be explained by supplementary indicators.
最后,如果总是存在不可降的错误率,只能依靠人工资源,通过人为增删指标关系,优化对齐结果。Finally, if there is always an irreducible error rate, we can only rely on human resources to optimize the alignment results by artificially adding and deleting index relationships.
以盒马鲜生服务为例,指标预处理和语义对齐的结果如图5和6所示。Taking Hema Xiansheng service as an example, the results of index preprocessing and semantic alignment are shown in Figures 5 and 6.
本发明量化方式对齐的目的是在已知指标在不同时空界条件下的样本数据,定义时空界并划分服务域,然后利用核密度估计拟合指标在单域和富域上的时空界特征分布,根据拟合的概率密度函数求解概率分布函数,而后以分位数为基准求解指标在不同时空界特征下对应的取值。指标的具体取值与实际服务水平之间的映射关系并不是唯一恒定的,在不同时空界条件下相同的指标取值也可能对应不同的服务水平,而不同的服务水平在不同时空界条件下指标有可能取到相同的值。例如,物价水平和商品均价在不同的地域有明显差异,同样的商品均价在哈尔滨属于高物价而在上海却对应低物价;或者是配送效率和配送时长在时间、空间和领域上也存在明显差异,以时间域为例,就餐低峰期高效的配送仅需20分钟,就餐高峰期高效配送的时间一般在30~40分钟左右,而在午夜高效的配送时间却在50~60分钟。如果不考虑指标在不同时空界上特征分布的差异性,将导致服务决策和优化的失效或失衡,例如企业在全国范围内制定统一的商品提价调整 策略,对低收入地区会体会到物价明显上涨而高收入地区并未感受到明显差异。借助本发明提到的量化方式对齐方法,决策者可以感知到指标取值在不同时空界的分布差异,并按照对齐映射函数制定合理的企业决策方案。The purpose of the quantitative alignment method of the present invention is to define the space-time boundary and divide the service domain based on the sample data of the known index under different space-time boundary conditions, and then use the kernel density estimation to fit the spatio-temporal boundary characteristic distribution of the index on the single domain and the rich domain. , solve the probability distribution function according to the fitted probability density function, and then use the quantile as the benchmark to solve the corresponding value of the index under different space-time boundary characteristics. The mapping relationship between the specific value of the index and the actual service level is not unique and constant. The same index value may also correspond to different service levels under different space-time boundary conditions, and different service levels are under different space-time boundary conditions. It is possible for the indicator to take the same value. For example, the price level and average price of commodities vary significantly in different regions. The same commodity average price is high in Harbin but low in Shanghai; or distribution efficiency and delivery time also exist in time, space and field. There are obvious differences. Taking the time domain as an example, the efficient delivery time during the off-peak dining period only takes 20 minutes, the high-efficiency delivery time during the dining peak period is generally about 30-40 minutes, and the efficient delivery time at midnight is 50-60 minutes. If the difference in characteristic distribution of indicators in different time and space boundaries is not considered, it will lead to the failure or imbalance of service decision-making and optimization. For example, if an enterprise formulates a unified commodity price adjustment strategy across the country, it will be obvious to low-income areas. Rising and high-income regions did not feel a significant difference. With the aid of the quantitative alignment method mentioned in the present invention, the decision maker can perceive the distribution difference of the index value in different time and space boundaries, and formulate a reasonable enterprise decision plan according to the alignment mapping function.
本发明的面向时空特征的多参与者服务价值-质量-能力评价指标量化方式对齐的方法具体实施步骤如下:The specific implementation steps of the method for aligning the quantification method of the multi-participant service value-quality-capability evaluation index oriented to spatiotemporal characteristics of the present invention are as follows:
步骤一、时空界定义与服务域划分 Step 1. Definition of space-time boundary and division of service domain
步骤1.1、时间域Step 1.1, time domain
时间域具有自然的连续性,可以用区间数描述。具体定义方式如下:The time domain has natural continuity and can be described by interval numbers. The specific definition is as follows:
1、时钟触发1. Clock trigger
[T start,T end],以过去某一时刻或当前时刻为T start,定义某个具体的截至时间为T end[T start ,T end ], take a certain moment in the past or the current moment as T start , and define a specific deadline as T end ;
[T start,T end] period,定义固定的T start和T end,定义一个时钟周期period; [T start ,T end ] period , define fixed T start and T end , define a clock period period;
[N i,N j] slice,定义一个固定的时间切片slice,以第N i个切片开始,以第N j个切片结束。 [N i , N j ] slice , defines a fixed time slice slice, starting with the N i th slice and ending with the N j th slice.
2、事件触发2. Event trigger
[T E-start,T E-end] Event,以事件发生为T E-start,以事件影响结束为T E-end,Event为时间域的触发事件。 [T E-start , T E-end ] Event , taking the event occurrence as T E-start , taking the event’s influence end as T E-end , and Event being the trigger event in the time domain.
[T E-start,T E-start+Δt] Event,以事件发生为T E-start,定义事件影响持续时长Δt,特别的当Δt=0时,表示Event的影响为突变的。 [T E-start , T E-start +Δt] Event , take the event occurrence as T E-start , define the duration Δt of event influence, especially when Δt=0, it means that the influence of Event is abrupt.
3、活动触发3. Activity trigger
[∞,T A-start] Activity,表示活动开始T A-start之前的时间段。 [∞, T A-start ] Activity , indicating the time period before the activity starts T A-start .
[T A-start,T A-end] Activity,表示活动执行之间的时间段。 [T A-start , T A-end ] Activity , indicating the time period between the execution of the activity.
[T A-start,∞] Activity,表示活动开始T A-start之后的时间段。 [T A-start ,∞] Activity , which represents the time period after the activity starts T A-start .
[T A-end,∞] Activity,表示活动结束T A-end之后的时间段。 [T A-end , ∞] Activity , indicating the time period after the activity ends T A-end .
步骤1.2、空间域Step 1.2, Spatial Domain
简单理解,空间域就是地理位置域,可以用集合代数的方式描述。具体定义方式如下:To put it simply, the spatial domain is the geographic domain, which can be described in the form of set algebra. The specific definition is as follows:
1、位置:①具有经纬度属性的某一地理位置;②具专有名称的街道、商圈、社区等;③根据国家行政区域划分确定的省市区名称。1. Location: ① a geographic location with latitude and longitude attributes; ② streets, business districts, communities, etc. with proper names; ③ names of provinces and municipalities determined according to the division of national administrative regions.
2、邻域:由位置s 0和邻域半径ρ确定的某一地理范围。 2. Neighborhood: a certain geographic range determined by the location s 0 and the neighborhood radius ρ.
Figure PCTCN2021089373-appb-000015
Figure PCTCN2021089373-appb-000015
3、地域属性,可以通过地域优势排名(比如地域经济发展、人口密度、教育水平、消费指数等),每个地域会对应一个Rank值,由此确定偏序关系。3. Regional attributes can be ranked by regional advantages (such as regional economic development, population density, education level, consumption index, etc.), and each region will correspond to a Rank value, thereby determining the partial order relationship.
步骤1.3、广义域Step 1.3, Generalized Domain
广义域是依据某一边界规则,将服务域划分为若干子域,突出不同子域的特性及子域间随着业务优化、服务协作等发生的融合与变迁。边界规则可以根据行业领域、服务内容及性质、服务执行依赖的技术平台等制定。传统的服务边界的定义仅仅局限于自治组织间存在管理边界,而将其他边界都等价为由组织边界导致的技术平台独立与 服务内容分割,但随着SaaS云平台的推广与普及,组织边界已经不足以充分刻画服务边界的存在,需要定义更丰富的服务边界,在服务协作与融合时提供判定依据。The generalized domain is to divide the service domain into several sub-domains according to a certain boundary rule, highlighting the characteristics of different sub-domains and the fusion and transition between sub-domains with business optimization and service collaboration. Boundary rules can be formulated according to the industry field, service content and nature, and the technology platform on which service execution depends. The traditional definition of service boundaries is limited to the existence of management boundaries between autonomous organizations, and other boundaries are equivalent to the separation of technology platforms and service content caused by organizational boundaries. However, with the promotion and popularization of SaaS cloud platforms, organizational boundaries It is not enough to fully describe the existence of service boundaries. It is necessary to define richer service boundaries to provide a basis for judgment in service collaboration and integration.
步骤二、拟合指标单域/富域分布特征 Step 2. Fitting the single-domain/rich-domain distribution characteristics of the index
我们一般不能提前预估样本数据的分布类型,也不能肯定分布曲线有几个峰值,所以一般的参数估计方案并不使用,本发明采用核密度估计实现非参数估计,借助Statsmodels库实现概率分布拟合,选定“gau”为核函数,“scott”为带宽计算函数,输入某一服务域下的样本数据DateSet d‘,借助KDEUnivariate函数拟合指标在d‘服务域上的概率密度函数pdf d‘和概率分布函数cdf d‘。以国内三大航司退改手续费标准为例,如图7所示为指标在舱位等级、起飞时间和航司三个维度上的单域分布特征,图8所示为指标在舱位等级和起飞时间上的富域分布特征,从中可以看出在不同域上指标分布存在明显差异。 Generally, we cannot predict the distribution type of the sample data in advance, nor can we be sure that the distribution curve has several peaks, so the general parameter estimation scheme is not used. Combine, select "gau" as the kernel function, "scott" as the bandwidth calculation function, input the sample data DateSet d' under a certain service domain, and use the KDEUnivariate function to fit the probability density function pdf d of the indicator on the d' service domain ' and the probability distribution function cdf d' . Taking the refund and change fee standards of the three major domestic airlines as an example, Figure 7 shows the single-domain distribution characteristics of the indicators in the three dimensions of cabin class, departure time and airline, and Figure 8 shows the indicators in the cabin class and airline. From the rich domain distribution characteristics of take-off time, it can be seen that there are obvious differences in the distribution of indicators in different domains.
步骤三、计算指标在量化方式上的对齐关系Step 3. Calculate the alignment relationship of the indicators in terms of quantitative methods
在步骤二的基础上我们得到了指标在不同时空界服务域上的特征分布,接下来需要利用这些分布函数建立不同时空界上指标取值之间的对应关系。本发明以分位数α为对齐基准,假设指标I在a,b两个服务域上呈现两种分布cdf(I a),cdf(I b),对概率分布函数求逆得到以α,α∈[0,1]为自变量的函数,每个分位数α‘都对应两个指标取值i‘ a,i‘ b,如此便可建立起两个服务域上指标取值间的对应关系,如图9所示。同理,对于多个时空界指标对齐也是以分位数为基准确立的,可以将服务等级转化为[0,1]之间的某个数,便可知某个服务等级在不同 时空界条件下对应的具体指标取值。 On the basis of step 2, we obtained the characteristic distribution of the indicators in different time-space boundary service domains. Next, we need to use these distribution functions to establish the corresponding relationship between the index values in different time-space boundaries. In the present invention, the quantile α is used as the alignment reference, and it is assumed that the indicator I presents two distributions cdf(I a ) and cdf(I b ) on the two service domains a and b. ∈[0,1] is the function of the independent variable, each quantile α' corresponds to two index values i' a , i' b , so that the correspondence between the index values on the two service domains can be established relationship, as shown in Figure 9. In the same way, the alignment of multiple space-time boundary indicators is also established on the basis of quantiles. The service level can be converted into a number between [0, 1], and it can be known that a certain service level is under different space-time boundary conditions. The corresponding specific index value.
本发明未尽事宜为公知技术。Matters not addressed in the present invention are known in the art.

Claims (10)

  1. 一种面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述方法包括如下步骤:A multi-party service value-quality-capability index alignment method oriented to the space-time boundary, characterized in that the method comprises the following steps:
    步骤一、从价值-质量-能力的评价指标定义中提取包含服务内容、业务活动、指标评价侧面和指标评价规则的关键词组;Step 1: Extract keyword groups including service content, business activities, index evaluation aspects and index evaluation rules from the value-quality-capability evaluation index definition;
    步骤二、根据公共词典、领域词典和自建词典,分别计算两两指标四类关键词组之间的语素关系,得到指标之间的语义相似度矩阵;Step 2: According to the public dictionary, the domain dictionary and the self-built dictionary, calculate the morpheme relationship between the two pairs of indicators and the four types of keyword groups respectively, and obtain the semantic similarity matrix between the indicators;
    步骤三、借助语义相似度矩阵判定指标之间的语义关系,并计算关系置信度;Step 3: Determine the semantic relationship between the indicators with the help of the semantic similarity matrix, and calculate the relationship confidence;
    步骤四、按照步骤三判定所有指标的语义关系得到语义关系网,根据指标之间语义关系的方向和数量删除冗余的边,简化语义网;Step 4: Determine the semantic relationship of all indicators according to step 3 to obtain a semantic relationship network, delete redundant edges according to the direction and number of semantic relationships between indicators, and simplify the semantic network;
    步骤五、根据指标在不同时空界下的样本数据拟合指标在单域和富域上的分布特征;Step 5: Fitting the distribution characteristics of the index on the single domain and the rich domain according to the sample data of the index in different space-time boundaries;
    步骤六、以概率分位数为参考建立指标量化方式上的对齐关系。Step 6: Use the probability quantile as a reference to establish an alignment relationship in terms of index quantification.
  2. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤一中,指标定义包括指标名称、缩略语/习语、英文简写、指标解释、优越方向、量纲、取值范围、计算公式。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, wherein in the step 1, the index definition includes index name, abbreviation/idiom, English abbreviation, index explanation, superiority Direction, dimension, value range, calculation formula.
  3. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤一中,关键词组具体指:①服务内容;②业务活动;③评价侧面;④评价规则。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, wherein in the step 1, the keyword group specifically refers to: ① service content; ② business activity; ③ evaluation side; ④ evaluation rule.
  4. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力 指标对齐方法,其特征在于所述步骤二中,公共词典包括百度汉语词典、HowNet和同义词林(扩展版);领域词典包括搜狗行业词库、百度行业词库,自建词典中词组的定义内容包括ID、词组、词性、所述类别、近义词、反义词、同类词、上位词、下位词、因果相关词组、所属/来源相关词组、使用/工具相关词组、组成/总分相关词组、执行依赖相关词组中的几种。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, characterized in that in the second step, the public dictionary includes Baidu Chinese Dictionary, HowNet and Synonym Forest (extended version); the domain dictionary includes Sogou industry thesaurus, Baidu industry thesaurus, the definitions of phrases in self-built dictionaries include ID, phrase, part of speech, described category, synonyms, antonyms, similar words, hypernyms, hyponyms, causally related phrases, belonging/source related Several of phrases, usage/tool-related phrases, composition/total score-related phrases, and execution-dependent-related phrases.
  5. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤二中,语素关系包括相似、相近、相关、同类四种。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, characterized in that in the second step, the morpheme relationship includes four types: similarity, similarity, correlation, and same type.
  6. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤三中,语义关系包括:The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, wherein in the step 3, the semantic relationship includes:
    相似关系:①同一指标;②共轭指标;③上下级指标;Similar relationship: ①same index; ②conjugate index; ③superior index;
    相关关系:④服务内容相关;⑤业务相关;⑥指标相关;Relevant relationship: ④Service content related; ⑤Business related; ⑥Index related;
    同类指标:⑦同类服务评价侧面;⑧同类业务;⑨同类服务内容。Similar indicators: ⑦Similar service evaluation side; ⑧Similar business; ⑨Similar service content.
  7. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤三中,关系置信度
    Figure PCTCN2021089373-appb-100001
    的计算公式如下:
    The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, characterized in that in the step 3, the relationship confidence
    Figure PCTCN2021089373-appb-100001
    The calculation formula is as follows:
    Figure PCTCN2021089373-appb-100002
    Figure PCTCN2021089373-appb-100002
    其中,n、m分别表示指标I n和指标I m,k是指四类关键词组,num是指
    Figure PCTCN2021089373-appb-100003
    RelationType=r Max的词的数量。
    Among them, n and m represent the index I n and the index I m respectively, k refers to the four types of keyword groups, and num refers to the
    Figure PCTCN2021089373-appb-100003
    Number of words with RelationType=r Max .
  8. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力 指标对齐方法,其特征在于所述步骤四中,语义关系网指以指标为节点、以指标间语义关系为边的网,边属性为语义关系类型和置信度,边方向包括有向和无向两种。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, characterized in that in the step 4, the semantic relation network refers to a network with indexes as nodes and semantic relationships between indexes as edges, The edge attributes are the semantic relationship type and confidence, and the edge direction includes both directed and undirected.
  9. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤五中,时是指不同时间域,空是指不同的地理域,界是指不同的服务实施环境、不同的服务实施平台或者不同的服务参与者;单域分布特征是指指标在一个服务域上的概率分布特征,富域分布特征是指指标在两个及以上服务域上的概率分布特征。The multi-party service value-quality-capability index alignment method oriented to the time-space boundary according to claim 1, characterized in that in the step 5, time refers to different time domains, space refers to different geographical domains, and boundary refers to different different service implementation environments, different service implementation platforms, or different service participants; single-domain distribution characteristics refer to the probability distribution characteristics of indicators in one service domain, and rich-domain distribution characteristics refer to indicators in two or more service domains. Probability distribution features.
  10. 根据权利要求1所述的面向时空界的多方服务价值-质量-能力指标对齐方法,其特征在于所述步骤六中,指标量化方式上的对齐关系是指求解指标在不同时空界特征下对应某一类服务等级的取值范围,或判定指标在特定时空界下的取值映射到相应的服务等级上。The multi-party service value-quality-capability index alignment method oriented to the space-time boundary according to claim 1, characterized in that in the step 6, the alignment relationship in the index quantification method means that the solution index corresponds to a certain value under different space-time boundary characteristics. The value range of a class of service level, or the value of the judgment index under a specific space-time boundary is mapped to the corresponding service level.
PCT/CN2021/089373 2020-08-18 2021-04-23 Time-space boundary-oriented multi-party service value-quality-capability index alignment method WO2022037103A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010833133.2A CN111898928B (en) 2020-08-18 2020-08-18 Multi-party service value-quality-capability index alignment method facing space-time boundary
CN202010833133.2 2020-08-18

Publications (1)

Publication Number Publication Date
WO2022037103A1 true WO2022037103A1 (en) 2022-02-24

Family

ID=73229209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089373 WO2022037103A1 (en) 2020-08-18 2021-04-23 Time-space boundary-oriented multi-party service value-quality-capability index alignment method

Country Status (2)

Country Link
CN (1) CN111898928B (en)
WO (1) WO2022037103A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898928B (en) * 2020-08-18 2021-08-31 哈尔滨工业大学 Multi-party service value-quality-capability index alignment method facing space-time boundary
CN112732251A (en) * 2020-12-25 2021-04-30 哈尔滨工业大学 Semi-automatic generation method of service value network facing service internet

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622675A (en) * 2012-03-14 2012-08-01 浙江大学 Method and system for realizing interoperation of enterprises under cluster supply chain environment
CN106156082A (en) * 2015-03-31 2016-11-23 华为技术有限公司 A kind of body alignment schemes and device
CN107315768A (en) * 2017-05-17 2017-11-03 上海交通大学 The distribution information interacting method and system mapped based on Heterogeneous Information model
US20170364503A1 (en) * 2016-06-17 2017-12-21 Abbyy Infopoisk Llc Multi-stage recognition of named entities in natural language text based on morphological and semantic features
CN111898928A (en) * 2020-08-18 2020-11-06 哈尔滨工业大学 Multi-party service value-quality-capability index alignment method facing space-time boundary

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645934B (en) * 2009-08-31 2012-02-29 东软集团股份有限公司 Web service evaluation method based on weight, Web service finding method and device thereof
CN105740237B (en) * 2016-02-03 2018-04-13 湘潭大学 A kind of student ability degree of reaching evaluation measure based on Similarity of Words
CN110175325B (en) * 2019-04-26 2023-07-11 南京邮电大学 Comment analysis method based on word vector and syntactic characteristics and visual interaction interface

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622675A (en) * 2012-03-14 2012-08-01 浙江大学 Method and system for realizing interoperation of enterprises under cluster supply chain environment
CN106156082A (en) * 2015-03-31 2016-11-23 华为技术有限公司 A kind of body alignment schemes and device
US20170364503A1 (en) * 2016-06-17 2017-12-21 Abbyy Infopoisk Llc Multi-stage recognition of named entities in natural language text based on morphological and semantic features
CN107315768A (en) * 2017-05-17 2017-11-03 上海交通大学 The distribution information interacting method and system mapped based on Heterogeneous Information model
CN111898928A (en) * 2020-08-18 2020-11-06 哈尔滨工业大学 Multi-party service value-quality-capability index alignment method facing space-time boundary

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI JIA, ZHU, MING; LIU, CHENG; YANG, ZHENG-QIU: "Research and Implementation on Chinese Ontology Mapping", JOURNAL OF CHINESE INFORMATION PROCESSING, vol. 21, no. 4, 31 July 2007 (2007-07-31), XP055902041, ISSN: 1003-0077 *

Also Published As

Publication number Publication date
CN111898928A (en) 2020-11-06
CN111898928B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
WO2021196520A1 (en) Tax field-oriented knowledge map construction method and system
Maedche et al. Discovering conceptual relations from text
Maedche et al. Semi-automatic engineering of ontologies from text
Maedche et al. Mining ontologies from text
Navigli et al. Ontology learning and its application to automated terminology translation
WO2022037103A1 (en) Time-space boundary-oriented multi-party service value-quality-capability index alignment method
CN107103100B (en) A kind of fault-tolerant intelligent semantic searching method based on map framework
Elbendak et al. Parsed use case descriptions as a basis for object-oriented class model generation
WO2009152154A1 (en) Automatic sentiment analysis of surveys
CN105573977A (en) Method and system for identifying Chinese event sequential relationship
CN108710663A (en) A kind of data matching method and system based on ontology model
Ramos et al. Big data warehouse framework for smart revenue management
Yan et al. Response selection from unstructured documents for human-computer conversation systems
Allen et al. Broad coverage, domain-generic deep semantic parsing
Catterwell Automation in contract interpretation
Popping Online tools for content analysis
Zhu et al. A method for the dynamic collaboration of the public and experts in large-scale group emergency decision-making: Using social media data to evaluate the decision-making quality
Nguyen et al. Ripple down rules for question answering
Sonbol et al. A Machine Translation Like Approach to Generate Business Process Model from Textual Description
Cui et al. Mining concepts from wikipedia for ontology construction
Mariani et al. Reuse and plagiarism in Speech and Natural Language Processing publications
Savin et al. Using computational linguistics to analyse main research directions in economy of regions
Yijing Intelligent customer service system design based on natural language processing
Cao et al. A text-based mining approach for real estate policy impact monitoring and analysis
Long Application of Artificial Intelligence (AI) technology in Chinese English translation system corpus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857204

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857204

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09/08/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21857204

Country of ref document: EP

Kind code of ref document: A1