CN113946668A - Semantic processing method, system and device based on edge node and storage medium - Google Patents

Semantic processing method, system and device based on edge node and storage medium Download PDF

Info

Publication number
CN113946668A
CN113946668A CN202111165947.4A CN202111165947A CN113946668A CN 113946668 A CN113946668 A CN 113946668A CN 202111165947 A CN202111165947 A CN 202111165947A CN 113946668 A CN113946668 A CN 113946668A
Authority
CN
China
Prior art keywords
semantic
edge node
corpus
industry
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111165947.4A
Other languages
Chinese (zh)
Inventor
龚晟
杨震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi IoT Technology Co Ltd
Original Assignee
Tianyi IoT Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi IoT Technology Co Ltd filed Critical Tianyi IoT Technology Co Ltd
Priority to CN202111165947.4A priority Critical patent/CN113946668A/en
Publication of CN113946668A publication Critical patent/CN113946668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic processing method, a semantic processing system, a semantic processing device and a semantic processing storage medium based on edge nodes, wherein the semantic processing method is executed by the edge nodes, firstly, the edge nodes acquire linguistic data to be processed sent by a terminal, and perform industry feature matching on the linguistic data to be processed according to an industry knowledge base and a scene corpus which are positioned at the edge nodes, so as to determine a linguistic data scene corresponding to the linguistic data to be processed; and according to the corpus scene, selecting a corresponding industry language model to process the corpus to be processed to generate a first semantic result corresponding to the corpus to be processed, and sending the first semantic result back to the terminal by the edge node to complete the semantic processing. The embodiment of the application provides that the linguistic data to be processed is processed through an industry language model, so that the matching degree of semantic processing results and industries is improved; in addition, the semantic processing process of the embodiment of the application is mainly completed at the edge node, which is beneficial to meeting the requirements of safety and privacy of users and has a positive effect on popularization of the semantic processing technology.

Description

Semantic processing method, system and device based on edge node and storage medium
Technical Field
The present application relates to the field of semantic processing technologies, and in particular, to a semantic processing method, system, apparatus, and storage medium based on edge nodes.
Background
With the continuous development of artificial intelligence technology, semantic processing technology based on semantic processing and understanding also develops rapidly. Taking human-computer interaction equipment applying semantic understanding as an example, people can use more natural language to more conveniently complete the interaction between people and machines through semantic processing, thereby achieving the purposes of reducing the operation threshold of the human-computer interaction equipment and improving the efficiency of each item of work.
However, the general semantic understanding service in the related art is not well adapted to various specific business scenarios. The industries such as medical treatment, finance and the like contain a large amount of special vocabularies, and the word updating of some new media industries is very quick, so that the general semantic understanding service in the related technology cannot meet the scene recognition requirement of specific businesses easily. In addition, information data assets in various industries have certain security and privacy requirements, which are difficult to meet with general semantic processing services.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the application provides a semantic processing method, a semantic processing system, a semantic processing device and a semantic processing storage medium based on edge nodes.
In a first aspect, an embodiment of the present application provides an edge node-based semantic processing method, where the method is performed by an edge node in an edge node-based semantic processing system, where the edge node-based semantic processing system includes the edge node and a terminal, and the method includes: obtaining a corpus to be processed; according to an industry knowledge base and a scene corpus which are positioned at the edge node, performing industry feature matching on the linguistic data to be processed to determine a linguistic data scene; determining an industry language model according to the corpus scene; determining a first semantic result according to the industry language model and the linguistic data to be processed; and returning the first semantic result to the terminal.
Optionally, the edge node-based semantic processing system further includes a cloud, and the method further includes: calculating a first confidence coefficient of the first semantic result according to the industry knowledge base and the scene corpus; when the first confidence coefficient is lower than a preset confidence coefficient threshold value, a cooperative processing request is sent to the cloud end, so that the cloud end obtains a supplementary corpus, and the supplementary corpus is returned to the edge node; determining a plurality of second semantic results according to the supplementary corpus and the first semantic results; calculating a second confidence coefficient of the second semantic result according to the industry knowledge base and the scene corpus; and returning the second semantic result with the highest second confidence coefficient to the terminal.
Optionally, the method further comprises: and adding the supplementary corpora into the industry knowledge base and the scene corpus.
Optionally, the enabling the cloud to obtain the supplementary corpus includes: according to a preset retrieval condition, the cloud retrieves the supplementary corpus in the Internet; wherein the retrieval condition comprises pronunciation identity with the first semantic result.
Optionally, the method further includes a process of constructing the industry language model, specifically including: randomly extracting N types of samples from the obtained corpus samples to serve as a first sample set, wherein the first sample set comprises N types of first samples; wherein the sample categories in the first sample set comprise scene results and target topics; in each class of the first samples, extracting K instances as a first instance set, the first instance set comprising K first instances; wherein the instance is a feature word; taking all the extracted first instances as a support set, and taking all the instances except the first instances in the first sample set as a query set; wherein, the support set is used for model training, and the query set is used for model testing; training and testing the industry language model by utilizing the support set and the query set; in the training process, the weight of the marked example in the first example is gradually increased by adopting a method of increasing the gradient weight; when the training times reach a preset first number, completing the construction of the industry language model; wherein N and M are both positive integers.
In a second aspect, an embodiment of the present application provides an edge node-based semantic processing system, where the apparatus is applied to an edge node in an edge node-based semantic processing system, where the edge node-based semantic processing system includes an edge node and a terminal, and the apparatus includes: the device comprises a first module, a second module, a third module, a fourth module and a fifth module; the first module is used for acquiring linguistic data to be processed; the second module is used for performing industry feature matching on the linguistic data to be processed and determining a linguistic data scene; the third module is used for determining an industry language model according to the corpus scene; the fourth module is used for determining a first semantic result according to the industry language model and the linguistic data to be processed; and the fifth module is used for returning the first semantic result to the terminal.
Optionally, the edge node-based semantic processing system further includes a cloud, and the apparatus further includes: a sixth module, a seventh module, an eighth module, a ninth module, and a tenth module; the sixth module is used for calculating a first confidence coefficient of the first semantic result according to an industry knowledge base and a scene corpus which are positioned at the edge node; the seventh module is configured to obtain, through the cloud, a supplementary corpus when the first confidence is lower than a preset confidence threshold; the eighth module is used for determining a plurality of second semantic results according to the supplementary corpus and the first semantic results; the ninth module is used for calculating a second confidence coefficient of the second semantic result according to the industry knowledge base and the scene corpus; the tenth module is configured to return the second semantic result with the highest second confidence to the terminal.
In a third aspect, an embodiment of the present application provides an apparatus, including: at least one processor; at least one memory for storing at least one program; when executed by the at least one processor, cause the at least one processor to implement the edge node-based semantic processing method according to the first aspect.
In a fourth aspect, the present application provides a computer storage medium, in which a processor-executable program is stored, and when the processor-executable program is executed by the processor, the processor-executable program is used to implement the edge node-based semantic processing method according to the first aspect.
The beneficial effects of the embodiment of the application are as follows: the method is executed by an edge node, firstly, the edge node acquires a corpus to be processed sent by a terminal, and carries out industry feature matching on the corpus to be processed according to an industry knowledge base and a scene corpus which are positioned at the edge node, so as to determine a corpus scene corresponding to the corpus to be processed; and selecting a corresponding industry language model to process the linguistic data to be processed according to the linguistic data scene to generate a first semantic result corresponding to the linguistic data to be processed, and sending the first semantic result back to the terminal by the edge node to complete the semantic processing. The embodiment of the application provides that the linguistic data to be processed is processed through the industry language model, which is beneficial to improving the fitting degree of semantic processing results and corresponding industries and corresponding service scenes; in addition, the semantic processing process of the embodiment of the application is mainly completed at the edge node, the design of the edge node is beneficial to meeting the requirements of safety and privacy of users, and the method has a positive effect on popularization of the semantic processing technology.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a schematic diagram of an edge node-based semantic processing system provided by an embodiment of the present application;
fig. 2 is a flowchart illustrating steps of a semantic processing method based on edge nodes according to an embodiment of the present disclosure;
fig. 3 is a corresponding relationship among a scene result, a feature word, and a target topic provided in the embodiment of the present application;
fig. 4 is a flowchart of a step of updating a semantic result according to a third-party resource according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an edge node-based semantic processing system according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional block divisions are provided in the system drawings and logical orders are shown in the flowcharts, in some cases, the steps shown and described may be performed in different orders than the block divisions in the systems or in the flowcharts. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic diagram of an edge node-based semantic processing system provided in an embodiment of the present application, and as shown in fig. 1, the system 100 includes a terminal 110 and an edge node 120. In this embodiment, the terminal may be any electronic device capable of submitting a semantic processing request to the edge node, such as a mobile phone, a smart phone, a Personal Digital Assistant (PDA), a wearable device, a pocket pc (ppc), a tablet pc, and the like. It is understood that the terminal may perform man-machine interaction in one or more manners, such as a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment. The terminal device can submit the semantic processing request to the edge node and receive the semantic processing result returned by the edge node.
The edge node may be any electronic device capable of performing semantic processing, such as a mobile phone, a smart phone, a Personal Digital Assistant (PDA), a wearable device, a pocket pc (Personal computer), a tablet pc, and the like. The edge node can receive a semantic processing request from the terminal, perform semantic processing service, and return a semantic processing result to the terminal.
In other embodiments, as shown in fig. 1, the semantic processing system based on edge nodes according to the embodiment of the present application further includes a cloud 130. In the embodiment of the application, the cloud end is a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and an artificial intelligence platform. The cloud may receive the co-processing request from the edge node, execute the corresponding web search service, and return the supplemental corpora to the edge node.
The edge node-based semantic processing method proposed by the embodiment of the present application can be implemented by the edge node-based semantic processing system shown in fig. 1, and a specific implementation process of the method will be described in the following.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of an edge node-based semantic processing method provided by an embodiment of the present application, where the method is executed by an edge node 120 in the edge node-based semantic processing system shown in fig. 1, and the method includes, but is not limited to, steps S200 to S240:
s200, obtaining a corpus to be processed;
specifically, the edge node obtains a corpus to be processed that needs to be subjected to semantic processing, where the corpus to be processed is sent by the terminal 110 in fig. 1. And the edge node performs semantic processing on the linguistic data to be processed according to the semantic processing request of the terminal. In the embodiment of the application, the corpora to be processed are corpora in the same industry, the expression forms of the corpora include but are not limited to user conversations, common instructions, industry papers, industry-related news and the like, and the corpora in the industry are used as the training corpora for semantic processing, so that the fitting degree of semantic processing services and the industry can be improved, and the accuracy of the semantic processing method based on the edge nodes can be improved.
S210, performing industry feature matching on the corpus to be processed according to an industry knowledge base and a scene corpus which are positioned at edge nodes, and determining a corpus scene;
specifically, different industries generally have different proper nouns, and in the process of semantic analysis, the interpretation of the nouns needs to be relied on to help the semantic analysis. Therefore, the embodiment of the application provides the industry knowledge base stored in the edge node, and the content in the industry knowledge base is the explanation of proper nouns in the industry in general. For example, in the transportation industry, the industry knowledge base can store the explanations of the terms "early peak", "accident-prone road section", "blind spot", etc. When the semantic analysis is performed on the sentence, not only the specific meaning of some industry proper nouns in the sentence needs to be clarified, but also the intention of the sentence needs to be clarified through the relation of words in the sentence and the like. Therefore, the embodiment of the present application further provides a scenized corpus stored in the edge node, by which the specific meaning of the sentence can be analyzed. For example, in the field of automobile control, when the input sentence is: when the windows of the driver's seat are opened, the specific meanings of the driver's seat and the windows can be determined according to the industry knowledge base provided by the embodiment of the application; by the aid of the scene corpus, the fact that the intention of the sentence is to open a designated window can be analyzed, and the action which needs to be executed next by the vehicle can be determined.
In this step, the obtained corpus to be processed is matched with industry knowledge base and scene corpus for industry feature, so as to determine which industry the corpus to be processed belongs to. It can be understood that even in the same industry, the corpus features for different services may be different, so that the corpus in the industry can be subdivided according to different service scenes, and in this step, the industry feature matching can also identify which service scene the corpus to be processed belongs to in the industry, thereby determining the corpus scene corresponding to the corpus to be processed currently.
In some embodiments, the industry feature matching may specifically be labeling key words or keywords in an industry knowledge base, matching words in the corpus to be processed with the keywords, and in different industry knowledge bases, the industry knowledge base with the most keywords matching the corpus to be processed may represent the industry corresponding to the corpus to be processed.
In other embodiments, the corpus overlap ratio of the industry knowledge base of the similar industry is also high, that is, when the industry feature matching is performed, the possible discrimination of the similar industry is low, and it is difficult to match the corpus to be processed to the correct industry. When the industries are matched, the word frequency of the same keyword in different industry knowledge bases can be comprehensively considered, and under the condition that the matching quantity of the linguistic data to be processed is similar to that of the keywords in the two industry knowledge bases, the industries where the linguistic data to be processed are located can be determined by comparing the word frequency of the keywords in the different industry knowledge bases.
S220, determining an industry language model according to the corpus scene;
specifically, the language models corresponding to a plurality of industries are stored at the edge node in the embodiment of the present application, and therefore, according to the corpus scene determined in step S210, the corresponding industry language model can be determined. The industry language model is added with industry characteristic language attributes on the basis of the general language model; a generic language model generally defines the probability distribution of sequences of tokens in a natural language, tokens generally being words, characters, bytes, or the like. The language model is used for identifying the speech and generating a semantic processing result. The industry language model is a lightweight model arranged at an edge node and can be obtained by training a small number of corpus samples, and the training method of the industry language model is described in the following.
In the embodiment of the application, the industry language model needs to be trained. Firstly, considering that the corpora of some industries are deficient, a small sample learning (raw corpus learning) method is adopted for training, specifically, a large number of corpora samples of the industry are obtained, and N types of samples are randomly extracted from the corpora samples to serve as a first sample set, wherein the first sample set comprises N types of first samples; wherein the sample categories in the first sample set comprise scene results and target topics; in each type of first sample, extracting K instances as a first instance set, wherein the first instance set comprises K first instances; wherein, the examples are feature words; taking all the extracted first instances as a support set, and taking all the instances except the first instances in the first sample set as a query set; wherein, the support set is used for model training, and the query set is used for model testing; and training and testing the industry language model by utilizing the support set and the query set. Referring to fig. 3, fig. 3 is a corresponding relationship among a scene result, feature words, and a target topic provided in the embodiment of the present application, as shown in fig. 3, the corresponding relationship is non-linear, M is the number of the corresponding relationship, and N is the number of the feature words. In the training process, the weight of the labeled example in the first example is increased step by adopting a gradient weight increasing method, wherein the labeled example is a part marked manually in the first example, that is, the weight of a part of marked feature words in fig. 3 is manually increased. The basis for increasing the weight can be the self-defined content of the word frequency, the word sequence and the like of the characteristic words. For example, in the semantic recognition of the scene, if the labeling function of the feature word is considered to be more significant, the feature word related to the key function of the scene recognition is given a fixed high weight, the feature word with the secondary function is given a lower weight, and so on. The goal that the industry language model supports semantic understanding of personalized scenes is achieved through a method of increasing the gradient weight. And when the training times reach a preset first number, completing the construction of the industry language model.
S230, determining a first semantic result according to the industry language model and the linguistic data to be processed;
specifically, according to the industry language model determined in step S220, the corpus to be processed is processed, that is, the corpus to be processed is input into the industry language model, and the industry language model outputs a corresponding semantic processing result, which is called a first semantic result, where the specific content of the first semantic result includes, but is not limited to, a request type and a request result.
S240, returning the first semantic result to the terminal;
specifically, the edge node returns the first semantic result to the terminal, and the terminal that has obtained the first semantic result can execute the corresponding service according to the semantic result, thereby completing the whole semantic processing process from the terminal to the edge node and then from the edge node to the terminal.
Referring to fig. 1, an embodiment of the present application provides an edge node-based semantic processing system, which includes a terminal and an edge node. Through steps S200 to S240, the embodiment of the present application provides a semantic processing method based on edge nodes, where the method is executed by the edge nodes shown in fig. 1, and first, the edge nodes obtain corpora to be processed sent from a terminal, and perform industry feature matching on the corpora to be processed according to an industry knowledge base and a scenarized corpus located at the edge nodes, so as to determine corpus scenes corresponding to the corpora to be processed; and according to the corpus scene, selecting a corresponding industry language model to process the corpus to be processed to generate a first semantic result corresponding to the corpus to be processed, and sending the first semantic result back to the terminal by the edge node to complete the semantic processing. The embodiment of the application provides that the linguistic data to be processed is processed through the industry language model, which is beneficial to improving the fitting degree of semantic processing results and corresponding industries and corresponding service scenes; in addition, the semantic processing process of the embodiment of the application is mainly completed at the edge node, the design of the edge node is beneficial to meeting the requirements of safety and privacy of users, and the method has a positive effect on popularization of the semantic processing technology.
In some embodiments, the edge node-based semantic processing method provided in the embodiment of the present application further includes a step of updating a semantic result according to a third-party resource, and referring to fig. 4, fig. 4 is a flowchart of the step of updating the semantic result according to the third-party resource provided in the embodiment of the present application, where the method includes, but is not limited to, steps S400 to S440:
s400, calculating a first confidence coefficient of the first semantic result according to the industry knowledge base and the scene corpus;
specifically, a first confidence coefficient of the first semantic result is calculated according to an industry knowledge base and a scene corpus corresponding to the corpus to be processed, the first confidence coefficient is used for representing the fitting degree of the first semantic result with the corresponding industry and the corresponding business scene, and the reliability of the first semantic result can be reflected according to the first confidence coefficient.
S410, when the first confidence coefficient is lower than a preset confidence coefficient threshold value, initiating a cooperative processing request to the cloud end so that the cloud end can obtain the supplementary corpus and return the supplementary corpus to the edge node;
specifically, the step S400 calculates a first confidence of the first semantic result, and compares the first confidence with a preset confidence threshold, where the confidence threshold is used to represent a minimum confidence required for the reliability of the first semantic result. It can be understood that, if the first confidence is higher than or equal to the preset confidence threshold, which indicates that the current first semantic result is reliable, the edge node may directly return the first semantic result to the terminal. Conversely, if the first confidence is lower than the preset confidence threshold, it is indicated that the current first semantic result is not reliable enough, or it can be said that the current industry language model does not have a corresponding processing result.
And if the current first semantic result is determined to be not reliable enough according to the first confidence, the edge node initiates a cooperative processing request to the cloud and receives a supplementary corpus sent back by the cloud. Because the storage capacity of the industry knowledge base and the scenarized corpus is limited in the edge node, in the embodiment of the present application, the supplementary corpus refers to a corpus in the same industry except for the contents recorded in the industry knowledge base and the scenarized corpus.
The specific way for the cloud to obtain the supplementary corpus can be that the cloud searches in an industry knowledge base and a scene corpus including but not limited to a third party, or a social network, and different search engines can be switched in the search so as to obtain a more comprehensive search result. In the searching process, the preset searching condition may be that the content contains the same pronunciation as the first semantic result, for example, the searching is performed through the whole pinyin unit of the word or the initial letter of the pronunciation of the word. In other embodiments, the content with similar first semantic result may be retrieved based on information such as term union. The embodiment of the application does not specifically limit the search mode and the search approach of the cloud, and it is intended to be explained in the application that when the industry knowledge base and the scenario corpus in the edge node do not have a processing result corresponding to the first semantic result, the scope of industry knowledge can be expanded in a cloud collaborative manner, and a supplementary corpus capable of supplementing the industry knowledge base and the scenario corpus can be obtained.
S420, determining a plurality of second semantic results according to the supplementary corpus and the first semantic results;
specifically, the first semantic result is supplemented by the supplementary corpus sent back by the cloud. The supplementary form can be that a plurality of key texts in the supplementary corpus are selected and added into the first semantic result, so that a plurality of second semantic results are generated on the basis of the first semantic result.
It can be understood that if different amounts or different key texts are selected from the supplementary corpus and added to the first semantic result, different second semantic results can be obtained, and therefore, a plurality of second semantic results may be obtained corresponding to the same supplementary corpus and the same first semantic result.
S430, calculating a second confidence coefficient of the second semantic result according to the industry knowledge base and the scene corpus;
specifically, similarly to the step S400, several second confidence degrees corresponding to several second semantic results obtained in the step S420 are calculated. Similar to the first confidence, the second confidence is used for representing the fitting degree of the second semantic result with the corresponding industry and the corresponding business scene, and the reliability of the second semantic result can be reflected according to the second confidence.
S440, returning a second semantic result with the highest second confidence coefficient to the terminal;
specifically, one of the second confidence coefficients calculated in step S430 with the highest numerical value is selected, and the second semantic result corresponding to the highest second confidence coefficient may be regarded as the most reliable semantic processing result, so that the edge node returns the second semantic result to the terminal as the result of the current semantic processing process.
In some embodiments, the supplementary corpus obtained in step S410 may be supplemented to the corresponding industry knowledge base and scenario corpus, and after a new supplementary corpus is added each time, the industry language model may be iteratively updated according to the updated industry knowledge base and scenario corpus, so that the industry language model can keep up with the corpus update speed of the corresponding industry as much as possible, and the industry language model can better fit with the corresponding industry and the corresponding business scenario, and a semantic processing result with higher confidence is obtained.
Through the steps of S400 to S440, the embodiment of the present application provides a scheme for updating semantic results according to third-party resources, and for industries with fast industry corpus update, the embodiment of the present application can update an industry knowledge base and a scene corpus in time according to the third-party resources, and update an industry language model by using the updated industry knowledge base and scene corpus, so that the industry language model has a high degree of fitting with the corpus of the industry all the time, which is helpful for improving the accuracy of semantic recognition, and has a positive effect on semantic processing services under different industry scenes.
According to one or more of the above embodiments, the present application provides a semantic understanding method executed by an edge node, where the edge node first obtains a corpus to be processed sent from a terminal, and performs industry feature matching on the corpus to be processed according to an industry knowledge base and a scene corpus located at the edge node, to determine a corpus scene corresponding to the corpus to be processed; and according to the corpus scene, selecting a corresponding industry language model to process the corpus to be processed to generate a first semantic result corresponding to the corpus to be processed, and sending the first semantic result back to the terminal by the edge node to complete the semantic processing. The embodiment of the application provides that the linguistic data to be processed is processed through the industry language model, which is beneficial to improving the fitting degree of semantic processing results and corresponding industries and corresponding service scenes; in addition, the semantic processing process of the embodiment of the application is mainly completed at the edge node, the design of the edge node is beneficial to meeting the requirements of safety and privacy of users, and the method has a positive effect on popularization of the semantic processing technology.
Referring to fig. 5, fig. 5 is a schematic diagram of an edge node-based semantic processing system according to an embodiment of the present application, where the apparatus is applied to an edge node in the edge node-based semantic processing system, and the apparatus 500 includes a first module 510, a second module 520, a third module 530, a fourth module 540, and a fifth module 550; the first module is used for acquiring linguistic data to be processed; the second module is used for performing industry feature matching on the linguistic data to be processed and determining a linguistic data scene; the third module is used for determining an industry language model according to the corpus scene; the fourth module is used for determining a first semantic result according to the industry language model and the linguistic data to be processed; and the fifth module is used for returning the first semantic result to the terminal.
In other embodiments, the edge node-based semantic processing system provided in this embodiment further includes a sixth module, a seventh module, an eighth module, a ninth module, and a tenth module; the sixth module is used for calculating a first confidence coefficient of the first semantic result according to the industry knowledge base and the scene corpus which are positioned at the edge node; the seventh module is used for acquiring the supplementary corpus through the cloud when the first confidence coefficient is lower than a preset confidence coefficient threshold; the eighth module is used for determining a plurality of second semantic results according to the supplementary corpus and the first semantic results; the ninth module is used for calculating a second confidence coefficient of the second semantic result according to the industry knowledge base and the scene corpus; and the tenth module is used for returning the second semantic result with the highest second confidence coefficient to the terminal.
Referring to fig. 6, fig. 6 is a schematic diagram of an apparatus 600 provided in an embodiment of the present application, where the apparatus 600 includes at least one processor 610 and at least one memory 620 for storing at least one program; one processor and one memory are exemplified in fig. 6.
The processor and memory may be connected by a bus or other means, such as by a bus in FIG. 6.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
The embodiment of the application also discloses a computer storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is used for realizing the semantic processing method based on the edge node when being executed by the processor.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are included in the scope of the present invention defined by the claims.

Claims (9)

1. An edge node-based semantic processing method, wherein the method is executed by an edge node in an edge node-based semantic processing system, the edge node-based semantic processing system comprising the edge node and a terminal, the method comprising:
obtaining a corpus to be processed;
according to an industry knowledge base and a scene corpus which are positioned at the edge node, performing industry feature matching on the linguistic data to be processed to determine a linguistic data scene;
determining an industry language model according to the corpus scene;
determining a first semantic result according to the industry language model and the linguistic data to be processed; the first semantic result comprises a request type and a request intent;
and returning the first semantic result to the terminal.
2. The edge node-based semantic processing method according to claim 1, wherein the edge node-based semantic processing system further comprises a cloud, and the method further comprises:
calculating a first confidence coefficient of the first semantic result according to the industry knowledge base and the scene corpus;
when the first confidence coefficient is lower than a preset confidence coefficient threshold value, a cooperative processing request is sent to the cloud end, so that the cloud end obtains a supplementary corpus, and the supplementary corpus is returned to the edge node;
determining a plurality of second semantic results according to the supplementary corpus and the first semantic results;
calculating a second confidence of the second semantic result according to the industry knowledge base and the scene corpus;
and returning the second semantic result with the highest second confidence coefficient to the terminal.
3. The edge node-based semantic processing method according to claim 2, further comprising:
and adding the supplementary corpora into the industry knowledge base and the scene corpus.
4. The edge node-based semantic processing method according to claim 2, wherein the enabling the cloud to obtain the supplementary corpus comprises:
according to a preset retrieval condition, the cloud retrieves the supplementary corpus in the Internet;
wherein the retrieval condition comprises pronunciation identity with the first semantic result.
5. The edge node-based semantic processing method according to any one of claims 1 to 4, further comprising a construction process of the industry language model, specifically comprising:
randomly extracting N types of samples from the obtained corpus samples to serve as a first sample set, wherein the first sample set comprises N types of first samples; wherein the sample categories in the first sample set comprise scene results and target topics;
in each class of the first samples, extracting K instances as a first instance set, the first instance set comprising K first instances; wherein the instance is a feature word;
taking all the extracted first instances as a support set, and taking all the instances except the first instances in the first sample set as a query set; wherein, the support set is used for model training, and the query set is used for model testing;
training and testing the industry language model by utilizing the support set and the query set;
in the training process, the weight of the marked example in the first example is gradually increased by adopting a method of increasing the gradient weight;
when the training times reach a preset first number, completing the construction of the industry language model;
wherein N and M are both positive integers.
6. An edge node-based semantic processing system, wherein the apparatus is applied to an edge node in an edge node-based semantic processing system, the edge node-based semantic processing system includes an edge node and a terminal, and the apparatus includes: the device comprises a first module, a second module, a third module, a fourth module and a fifth module;
the first module is used for acquiring linguistic data to be processed;
the second module is used for performing industry feature matching on the linguistic data to be processed and determining a linguistic data scene;
the third module is used for determining an industry language model according to the corpus scene;
the fourth module is used for determining a first semantic result according to the industry language model and the linguistic data to be processed;
and the fifth module is used for returning the first semantic result to the terminal.
7. The edge node-based semantic processing system of claim 6 further comprising a cloud, the apparatus further comprising: a sixth module, a seventh module, an eighth module, a ninth module, and a tenth module;
the sixth module is used for calculating a first confidence coefficient of the first semantic result according to an industry knowledge base and a scene knowledge base which are positioned at the edge node;
the seventh module is configured to obtain, through the cloud, a supplementary corpus when the first confidence is lower than a preset confidence threshold;
the eighth module is used for determining a plurality of second semantic results according to the supplementary corpus and the first semantic results;
the ninth module is configured to calculate a second confidence of the second semantic result according to the industry knowledge base and the scenarized knowledge base;
the tenth module is configured to return the second semantic result with the highest second confidence to the terminal.
8. An apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the edge node-based semantic processing method of any one of claims 1-5.
9. A computer storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by the processor, is configured to implement the edge node-based semantic processing method according to any one of claims 1 to 5.
CN202111165947.4A 2021-09-30 2021-09-30 Semantic processing method, system and device based on edge node and storage medium Pending CN113946668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111165947.4A CN113946668A (en) 2021-09-30 2021-09-30 Semantic processing method, system and device based on edge node and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111165947.4A CN113946668A (en) 2021-09-30 2021-09-30 Semantic processing method, system and device based on edge node and storage medium

Publications (1)

Publication Number Publication Date
CN113946668A true CN113946668A (en) 2022-01-18

Family

ID=79329837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111165947.4A Pending CN113946668A (en) 2021-09-30 2021-09-30 Semantic processing method, system and device based on edge node and storage medium

Country Status (1)

Country Link
CN (1) CN113946668A (en)

Similar Documents

Publication Publication Date Title
CN110765244B (en) Method, device, computer equipment and storage medium for obtaining answering operation
CN107679039B (en) Method and device for determining statement intention
CN108153901B (en) Knowledge graph-based information pushing method and device
CN107832414B (en) Method and device for pushing information
CN107256267B (en) Query method and device
US9582757B1 (en) Scalable curation system
WO2020077896A1 (en) Method and apparatus for generating question data, computer device, and storage medium
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
US10783877B2 (en) Word clustering and categorization
CN110597962B (en) Search result display method and device, medium and electronic equipment
CN111708869B (en) Processing method and device for man-machine conversation
US9613093B2 (en) Using question answering (QA) systems to identify answers and evidence of different medium types
CN109508458B (en) Legal entity identification method and device
CN109766418B (en) Method and apparatus for outputting information
CN111539197A (en) Text matching method and device, computer system and readable storage medium
CN108304424B (en) Text keyword extraction method and text keyword extraction device
CN108228567B (en) Method and device for extracting short names of organizations
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
WO2015084404A1 (en) Matching of an input document to documents in a document collection
CN111414735A (en) Text data generation method and device
CN113343108B (en) Recommended information processing method, device, equipment and storage medium
CN115878752A (en) Text emotion analysis method, device, equipment, medium and program product
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
CN112256863A (en) Method and device for determining corpus intentions and electronic equipment
US9946765B2 (en) Building a domain knowledge and term identity using crowd sourcing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination