CN107885879A - Semantic analysis, device, electronic equipment and computer-readable recording medium - Google Patents

Semantic analysis, device, electronic equipment and computer-readable recording medium Download PDF

Info

Publication number
CN107885879A
CN107885879A CN201711230879.9A CN201711230879A CN107885879A CN 107885879 A CN107885879 A CN 107885879A CN 201711230879 A CN201711230879 A CN 201711230879A CN 107885879 A CN107885879 A CN 107885879A
Authority
CN
China
Prior art keywords
word
candidate
characteristic
default
probable value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711230879.9A
Other languages
Chinese (zh)
Inventor
李泽中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaodu Information Technology Co Ltd
Original Assignee
Beijing Xiaodu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaodu Information Technology Co Ltd filed Critical Beijing Xiaodu Information Technology Co Ltd
Priority to CN201711230879.9A priority Critical patent/CN107885879A/en
Publication of CN107885879A publication Critical patent/CN107885879A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present disclosure discloses a kind of semantic analysis, device, electronic equipment and computer-readable recording medium, wherein, the semantic analysis includes:Obtain candidate's word collection;Calculate the probable value that candidate's word concentrates word to be default word;The word that the probable value is met to preparatory condition confirms as target word.The disclosure, which can aid in, carries out accurate user view identification, improves retrieval hit rate, effectively improves the service quality of businessman or service provider, strengthens Consumer's Experience.

Description

Semantic analysis, device, electronic equipment and computer-readable recording medium
Technical field
This disclosure relates to technical field of information processing, and in particular to a kind of semantic analysis, device, electronic equipment and meter Calculation machine readable storage medium storing program for executing.
Background technology
With the development of Internet technology, increasing businessman or service provider by internet platform come for Family provides service, and makes every effort to improve service quality, and strengthens Consumer's Experience, strives for more user's orders, to lift existing resource Utilization rate, be that businessman or service provider create more values.But current user is provided using businessman or service During the retrieval service that business provides, retrieval result hit rate can not meet the requirement of user, so as to weaken Consumer's Experience.
The content of the invention
The embodiment of the present disclosure provides a kind of semantic analysis, device, electronic equipment and computer-readable recording medium.
In a first aspect, a kind of semantic analysis is provided in the embodiment of the present disclosure.
Specifically, the semantic analysis, including:
Obtain candidate's word collection;
Calculate the probable value that candidate's word concentrates word to be default word;
The word that the probable value is met to preparatory condition confirms as target word.
With reference in a first aspect, the disclosure in the first implementation of first aspect, acquisition candidate's word collection, is wrapped Include:
Obtain input character string;
The input character string is split, obtains candidate's word;
Candidate's word collection is generated based on obtained candidate's word.
With reference in a first aspect, the disclosure in the first implementation of first aspect, it is described to calculate candidate's word The probable value that word is default word is concentrated, including:
Determine characteristic;
Obtain training term data;
Train to obtain the weighted value of characteristic based on the characteristic and training term data;
Weighted value based on the characteristic calculates the probable value that candidate's word is default word.
With reference in a first aspect, the disclosure in the first implementation of first aspect, the characteristic includes:Word The number that w occurs in current input character string, word w input in character string the number occurred in default historical time section, Whether word w adjacent word, word w part of speech, the part of speech of adjacent word, word w are one or more in preset name.
With reference in a first aspect, the disclosure in the first implementation of first aspect, it is described training term data include Positive sample word and negative sample word.
With reference in a first aspect, the disclosure in the first implementation of first aspect, it is described obtain training term data, Including:
Predetermined registration operation is performed to word, obtains predetermined registration operation data;
Calculate the matching degree between the word and predetermined registration operation data;
Word of the matching degree greater than or equal to preset matching degree threshold value is defined as positive sample word, by matching degree less than pre- If the word of matching degree threshold value is defined as negative sample word.
With reference in a first aspect, the disclosure in the first implementation of first aspect, it is described to be based on the characteristic Train to obtain the weighted value of characteristic with training term data, including:
It is trained based on the characteristic and training term data, obtains feature weight forecast model;
Weight corresponding to the characteristic is predicted based on the feature weight forecast model.
With reference in a first aspect, the disclosure is based on the feature in the first implementation of first aspect, using following formula The weighted value of data calculates the probable value p (w) that candidate's word w is default word:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
With reference in a first aspect, the disclosure in the first implementation of first aspect, it is described to meet the probable value The word of preparatory condition confirms as target word, including:
The word that the probable value is more than to predetermined probabilities threshold value is defined as target word.
With reference to the first of first aspect and first aspect implementation, the disclosure is in second of realization side of first aspect In formula, methods described also includes:Predetermined registration operation is performed to the target word.
Second aspect, a kind of semantic analysis device is provided in the embodiment of the present disclosure.
Specifically, the semantic analysis device, including:
Acquisition module, it is configured as obtaining candidate's word collection;
Computing module, it is configured as calculating the probable value that candidate's word concentrates word to be default word;
Confirm module, the word for being configured as meeting the probable value preparatory condition confirms as target word.
With reference to second aspect, in the first implementation of second aspect, the acquisition module includes the disclosure:
First acquisition submodule, it is configured as obtaining input character string;
Split submodule, be configured as splitting the input character string, obtain candidate's word;
Submodule is generated, is configured as generating candidate's word collection based on obtained candidate's word.
With reference to second aspect, in the first implementation of second aspect, the computing module includes the disclosure:
Determination sub-module, it is configured to determine that characteristic;
Second acquisition submodule, it is configured as obtaining training term data;
Submodule is trained, is configured as training to obtain the power of characteristic based on the characteristic and training term data Weight values;
Calculating sub module, it is the general of default word to be configured as the weighted value based on the characteristic and calculate candidate's word Rate value.
With reference to second aspect, in the first implementation of second aspect, the characteristic includes the disclosure:Word The number that w occurs in current input character string, word w input in character string the number occurred in default historical time section, Whether word w adjacent word, word w part of speech, the part of speech of adjacent word, word w are one or more in preset name.
With reference to second aspect, in the first implementation of second aspect, the training term data includes the disclosure Positive sample word and negative sample word.
With reference to second aspect, the disclosure is in the first implementation of second aspect, the second acquisition submodule bag Include:
Execution unit, it is configured as performing predetermined registration operation to word, obtains predetermined registration operation data;
Computing unit, it is configured as calculating the matching degree between the word and predetermined registration operation data;
Determining unit, it is configured as word of the matching degree greater than or equal to preset matching degree threshold value being defined as positive sample word Language, word of the matching degree less than preset matching degree threshold value is defined as negative sample word.
With reference to second aspect, in the first implementation of second aspect, the training submodule includes the disclosure:
Training unit, it is configured as based on the characteristic and trains term data to be trained, obtain feature weight Forecast model;
Predicting unit, it is configured as predicting weight corresponding to the characteristic based on the feature weight forecast model.
With reference to second aspect, in the first implementation of second aspect, the calculating sub module is configured the disclosure To calculate the probable value p (w) that candidate's word w is default word using weighted value of the following formula based on the characteristic:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
With reference to second aspect, in the first implementation of second aspect, the confirmation module is configured as the disclosure The word that the probable value is more than to predetermined probabilities threshold value is defined as target word.
With reference to the first of second aspect and second aspect implementation, the disclosure is in second of realization side of second aspect In formula, described device also includes:Execution module, it is configured as performing predetermined registration operation to the target word.
The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor, the memory Semantic analysis device is supported to perform the computer instruction of semantic analysis in above-mentioned first aspect for storing one or more, The processor is configurable for performing the computer instruction stored in the memory.The semantic analysis device can be with Including communication interface, for semantic analysis device and other equipment or communication.
Fourth aspect, the embodiment of the present disclosure provide a kind of computer-readable recording medium, for storing semantic analysis dress Computer instruction used is put, it is involved by semantic analysis device that it, which is included for performing semantic analysis in above-mentioned first aspect, And computer instruction.
The technical scheme that the embodiment of the present disclosure provides can include the following benefits:
Above-mentioned technical proposal, determine whether the word that user inputs is the default word such as important word by analyzing, to make Other predetermined registration operations are retrieved or performed to fixed corresponding search strategy, so as to help to carry out accurate user view Identification, the service quality of businessman or service provider are effectively improved, strengthen Consumer's Experience, get more users, be businessman Or service provider creates more values.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not The disclosure can be limited.
Brief description of the drawings
With reference to accompanying drawing, by the detailed description of following non-limiting embodiment, the further feature of the disclosure, purpose and excellent Point will be apparent.In the accompanying drawings:
Fig. 1 shows the flow chart of the semantic analysis according to the embodiment of the disclosure one;
Fig. 2 shows the flow chart of the step S101 according to Fig. 1 illustrated embodiments;
Fig. 3 shows the flow chart of the step S102 according to Fig. 1 illustrated embodiments;
Fig. 4 shows the flow chart of the step S302 according to Fig. 3 illustrated embodiments;
Fig. 5 shows the flow chart of the step S303 according to Fig. 3 illustrated embodiments;
Fig. 6 shows the structured flowchart of the semantic analysis device according to the embodiment of the disclosure one;
Fig. 7 shows the structured flowchart of the acquisition module 601 according to Fig. 6 illustrated embodiments;
Fig. 8 shows the structured flowchart of the computing module 602 according to Fig. 6 illustrated embodiments;
Fig. 9 shows the structured flowchart of the second acquisition submodule 802 according to Fig. 8 illustrated embodiments;
Figure 10 shows the structured flowchart of the training submodule 803 according to Fig. 8 illustrated embodiments;
Figure 11 shows the structured flowchart of the electronic equipment according to the embodiment of the disclosure one;
Figure 12 is adapted for the knot of the computer system for realizing the semantic analysis according to the embodiment of the disclosure one Structure schematic diagram.
Embodiment
Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is eliminated in the accompanying drawings Point.
In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification Feature, numeral, step, behavior, part, part or presence of its combination, and be not intended to exclude other one or more features, Numeral, step, behavior, part, part or its combination there is a possibility that or be added.
It also should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the disclosure It can be mutually combined.Describe the disclosure in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
The technical scheme that the embodiment of the present disclosure provides, determine whether the word that user inputs is important word etc. by analyzing Default word, is retrieved or is performed other predetermined registration operations, so as to help to carry out to formulate corresponding search strategy Accurate user view identification, effectively improves the service quality of businessman or service provider, strengthens Consumer's Experience, gets more User, be that businessman or service provider create more values.
Disclosed technique scheme may be used in word and be retrieved, searched for, with peering, for the convenience of narration, Hereinafter it is described in detail by taking retrieval as an example for disclosed technique scheme.
Fig. 1 shows the flow chart of the semantic analysis according to the embodiment of the disclosure one.As shown in figure 1, the semanteme Analysis method comprises the following steps S101-S103:
In step S101, candidate's word collection is obtained;
In step s 102, the probable value that candidate's word concentrates word to be default word is calculated;
In step s 103, the word for the probable value being met to preparatory condition confirms as target word.
In view of current user in the retrieval service provided using businessman or service provider, businessman or service carry Typically directly retrieved the character string of user's input as retrieval object for business, deposited naturally in the result that so retrieval obtains In many noises, it is impossible to retrieve the content that user wants to see exactly, that is to say, that retrieval result hit rate can not expire The requirement of sufficient user, so as to reduce the quality of businessman or service provider service, weaken Consumer's Experience.
In this embodiment, a kind of semantic analysis is proposed, this method is by analyzing the word for determining that user inputs Whether it is the default word such as important word, to assist subsequently to formulate corresponding search strategy, to be retrieved or performed other default Operation, specifically, candidate's word collection is obtained first, then calculate the probability that candidate's word concentrates word to be default word Value, the word that the probable value is finally met to preparatory condition confirm as target word, can subsequently be based on the target word and carry out Retrieve or perform other predetermined registration operations.The technical scheme can improve retrieval result hit rate, improve businessman or service carries For the quality of business's service, strengthen Consumer's Experience.
In an optional implementation of the present embodiment, as shown in Fig. 2 the step S101, that is, obtain candidate's word The step of collection, including step S201-S203:
In step s 201, input character string is obtained;
In step S202, the input character string is split, obtains candidate's word;
In step S203, candidate's word collection is generated based on obtained candidate's word.
Usual user is when being retrieved, it is impossible to which predict which or which word more has for retrieval Effect, it is therefore desirable to extracted from the character string of user's input, judge these for retrieving more efficiently word.Specifically, In this embodiment, after user inputs a character string, word segmentation is carried out to the character string of input first, obtains some times Word is selected, these candidate's words just constitute candidate's word collection, can be as the retrieval of the several words of subsequent analysis some or certain The basis of validity, in other words importance.
Wherein, the method for being split to obtain word for character string has a lot, and those skilled in the art can be according to reality The needs of application are selected, and the disclosure is not especially limited for specific string segmentation method.
In an optional implementation of the present embodiment, as shown in figure 3, the step S102, that is, calculate the candidate Word is concentrated the step of probable value that word is default word, including step S301-S304:
In step S301, characteristic is determined;
In step s 302, training term data is obtained;
In step S303, train to obtain the weighted value of characteristic based on the characteristic and training term data;
In step s 304, the weighted value based on the characteristic calculates the probable value that candidate's word is default word.
In the implementation, it is default word to calculate candidate's word to concentrate candidate's word using the method for model training Probable value, for example calculate probable value of a certain candidate's word for the higher important word of importance for retrieval.Specifically, The characteristic needed to use is determined first, obtains training term data, is then based on the characteristic and training word number The weighted value of characteristic is obtained according to training, the weighted value for being finally based on the characteristic calculates candidate's word as default word Probable value.
Wherein, the characteristic can include:A certain word w goes out in the current input character string that active user inputs Existing number, word w are in default historical time section in such as past one month or 1 year, the user or an interconnection It is the number that occurs in net platform in the character string of all users input, word w adjacent word, word w part of speech, adjacent Whether the part of speech of word, word w occurred as preset name, than such as whether for a trade company/service provider title or Whether person is one or more in a product/service name, and the part of speech such as can be:Noun, verb, adjective, Adverbial word etc..
Wherein, the training term data includes positive sample word and negative sample word.
In an optional implementation of the present embodiment, as shown in figure 4, the step S302, that is, obtain training word The step of data, including step S401-S403:
In step S401, predetermined registration operation is performed to word, obtains predetermined registration operation data;
In step S402, the matching degree between the word and predetermined registration operation data is calculated;
In step S403, word of the matching degree greater than or equal to preset matching degree threshold value is defined as positive sample word, Word of the matching degree less than preset matching degree threshold value is defined as negative sample word.
In order to improve the prediction accuracy of training pattern, it is necessary to select suitable training data, in this embodiment, base Matching degree between word and predetermined registration operation result carrys out preference pattern training data, specifically, firstly for substantial amounts of word Predetermined registration operation, such as search operaqtion are performed, obtains predetermined registration operation data, i.e. retrieval result, the word is then calculated and is tied with retrieval Matching degree between fruit, then matching degree can be used as positive sample word greater than or equal to the word of preset matching degree threshold value, instead It, matching degree can be used as negative sample word less than the word of preset matching degree threshold value, because entering when user inputs a character string Row retrieval, when an entry is then clicked in retrieval result, can largely illustrate that this entry meets user Search Requirement.
Wherein, matching degree threshold value can be configured according to the needs of practical application, and the disclosure is not made to have to its specific value Body limits.
In an optional implementation of the present embodiment, as shown in figure 5, the step S303, i.e., based on the feature The step of data and training term data train to obtain the weighted value of characteristic, including step S501-S502:
In step S501, it is trained based on the characteristic and training term data, obtains feature weight prediction Model;
In step S502, weight corresponding to the characteristic is predicted based on the feature weight forecast model.
Mentioned above, the characteristic that the disclosure is considered has many kinds, and these characteristics can be used to characterize some The importance of word, such as, the number that a certain word w occurs in the current input character string that active user inputs is more, says This bright word is more important;In some cases, word w in default historical time section such as past one month or In 1 year, the number occurred in the user or an internet platform in the character string of all users' inputs is fewer, illustrates this Word is unique, so as to which its importance is bigger;In some cases, noun is more important than verb, adjective and adverbial word; In some scenarios, whether the difference of a word w adjacent word, word w are that preset name can also influence the important of the word Property.But judgement of the features described above data for word importance is also different, that is to say, that is using above-mentioned various features When the importance of one word of data characterization, the weight of different characteristic should not make no exception, but should have different Weight.
Therefore, in this embodiment, predict that estimation is obtained for above-mentioned a variety of in other words using the mode of training pattern Optimal weight distribution for characteristic, such as, can be used in machine learning it is a kind of it is simple efficiently, practical application it is very extensive Logic Regression Models estimate the probable value, based on the characteristic and training term data, use logistic regression mould Type can obtain a feature weight forecast model, further can be obtained by using this feature weight forecast model and characteristic According to corresponding one group of feature weight value, one group corresponding with the characteristic as a rule can be obtained by optimization algorithm Optimal feature weight value.
In an optional implementation of the present embodiment, the step S304 can be embodied as being based on the spy using following formula The weighted value for levying data calculates the probable value p (w) that candidate's word w is default word:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
In an optional implementation of the present embodiment, the step S103, will the probable value meet default bar The word of part confirms as the step of target word, and the word that can be embodied as the probable value being more than predetermined probabilities threshold value is defined as Target word.
In this embodiment, candidate's word w is that the probable value p (w) of more important default word is more than predetermined probabilities The word of threshold value be regarded as importance it is relatively strong, for the predetermined registration operations such as retrieval relatively effective word, Therefore, such word can participate in the predetermined registration operations such as follow-up retrieval as target word, to improve retrieval hit rate.
Wherein, probability threshold value can be configured according to the needs of practical application, and the disclosure is not made specifically to its specific value Limit.
In an optional implementation of the present embodiment, methods described also includes performing the target word default behaviour The step of making, wherein, the predetermined registration operation includes:One or more in retrieval, search, matching.
Following is embodiment of the present disclosure, can be used for performing embodiments of the present disclosure.
Fig. 6 shows the structured flowchart of the semantic analysis device according to the embodiment of the disclosure one, and the device can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 6, the semantic analysis dress Put including:
Acquisition module 601, it is configured as obtaining candidate's word collection;
Computing module 602, it is configured as calculating the probable value that candidate's word concentrates word to be default word;
Confirm module 603, the word for being configured as meeting the probable value preparatory condition confirms as target word.
In view of current user in the retrieval service provided using businessman or service provider, businessman or service carry Typically directly retrieved the character string of user's input as retrieval object for business, deposited naturally in the result that so retrieval obtains In many noises, it is impossible to retrieve the content that user wants to see exactly, that is to say, that retrieval result hit rate can not expire The requirement of sufficient user, so as to reduce the quality of businessman or service provider service, weaken Consumer's Experience.
In this embodiment, a kind of semantic analysis device is proposed, the device is by analyzing the word for determining that user inputs Whether it is the default word such as important word, to assist subsequently to formulate corresponding search strategy, to be retrieved or performed other default Operation, specifically, candidate's word collection is obtained by acquisition module 601 first, the candidate is then calculated by computing module 602 Word concentrates the probable value that word is default word, and the probable value is met into preparatory condition finally by confirmation module 603 Word confirms as target word, subsequently can be retrieved or performed other predetermined registration operations based on the target word.The technical side Case can improve retrieval result hit rate, improve the quality of businessman or service provider service, strengthen Consumer's Experience.
In an optional implementation of the present embodiment, as shown in fig. 7, the acquisition module 601 includes:
First acquisition submodule 701, it is configured as obtaining input character string;
Split submodule 702, be configured as splitting the input character string, obtain candidate's word;
Submodule 703 is generated, is configured as generating candidate's word collection based on obtained candidate's word.
Usual user is when being retrieved, it is impossible to which predict which or which word more has for retrieval Effect, it is therefore desirable to extracted from the character string of user's input, judge these for retrieving more efficiently word.Specifically, In this embodiment, after the first acquisition submodule 701 gets a character string of user's input, submodule is split first The character string of 702 pairs of inputs carries out word segmentation, obtains some candidate's words, then generates submodule 703 by these candidate words Language form candidate's word collection, candidate's word collection can as the retrieval validity of the several words of subsequent analysis some or certain, or Person says the basis of importance.
Wherein, the mode for being split to obtain word for character string has a lot, and those skilled in the art can be according to reality The needs of application are selected, and the disclosure is not especially limited for specific string segmentation mode.
In an optional implementation of the present embodiment, as shown in figure 8, the computing module 602 includes:
Determination sub-module 801, is configured to determine that characteristic;
Second acquisition submodule 802, it is configured as obtaining training term data;
Submodule 803 is trained, is configured as based on the characteristic and trains term data to train to obtain characteristic Weighted value;
Calculating sub module 804, it is default word to be configured as the weighted value based on the characteristic and calculate candidate's word Probable value.
In the implementation, the computing module 602 calculates candidate's word using the method for model training and concentrates candidate Word is the probable value of default word, for example it is the higher important word of importance for retrieval to calculate a certain candidate's word Probable value.Specifically, the characteristic needed to use first by the determination of determination sub-module 801, pass through second and obtain submodule Block 802 obtains training term data, then trains submodule 803 to be based on the characteristic and training term data and trains to obtain The weighted value of characteristic, it is default that weighted value of the last calculating sub module 804 based on the characteristic, which calculates candidate's word, The probable value of word.
Wherein, the characteristic can include:A certain word w goes out in the current input character string that active user inputs Existing number, word w are in default historical time section in such as past one month or 1 year, the user or an interconnection It is the number that occurs in net platform in the character string of all users input, word w adjacent word, word w part of speech, adjacent Whether the part of speech of word, word w occurred as preset name, than such as whether for a trade company/service provider title or Whether person is one or more in a product/service name, and the part of speech such as can be:Noun, verb, adjective, Adverbial word etc..
Wherein, the training term data includes positive sample word and negative sample word.
In an optional implementation of the present embodiment, as shown in figure 9, second acquisition submodule 802 includes:
Execution unit 901, it is configured as performing predetermined registration operation to word, obtains predetermined registration operation data;
Computing unit 902, it is configured as calculating the matching degree between the word and predetermined registration operation data;
Determining unit 903, it is configured as word of the matching degree greater than or equal to preset matching degree threshold value being defined as positive sample This word, word of the matching degree less than preset matching degree threshold value is defined as negative sample word.
In order to improve the prediction accuracy of training pattern, it is necessary to select suitable training data, in this embodiment, the Two acquisition submodules 802 are based on the matching degree between word and predetermined registration operation result come preference pattern training data, specifically, head First pass through execution unit 901 and perform predetermined registration operation, such as search operaqtion for substantial amounts of word, obtain predetermined registration operation data, i.e., Retrieval result, the matching degree between the word and retrieval result is then calculated by computing unit 902, then determining unit 903 Can be using word of the matching degree greater than or equal to preset matching degree threshold value as positive sample word, conversely, can be by matching degree less than pre- If the word of matching degree threshold value as negative sample word, because being retrieved when user inputs a character string, is then being retrieved When an entry is clicked in as a result, it can largely illustrate that this entry meets the Search Requirement of user.
Wherein, matching degree threshold value can be configured according to the needs of practical application, and the disclosure is not made to have to its specific value Body limits.
In an optional implementation of the present embodiment, as shown in Figure 10, the training submodule 803 includes:
Training unit 1001, it is configured as based on the characteristic and trains term data to be trained, obtain feature Weight prediction model;
Predicting unit 1002, it is configured as predicting based on the feature weight forecast model and is weighed corresponding to the characteristic Weight.
Mentioned above, the characteristic that the disclosure is considered has many kinds, and these characteristics can be used to characterize some The importance of word, such as, the number that a certain word w occurs in the current input character string that active user inputs is more, says This bright word is more important;In some cases, word w in default historical time section such as past one month or In 1 year, the number occurred in the user or an internet platform in the character string of all users' inputs is fewer, illustrates this Word is unique, so as to which its importance is bigger;In some cases, noun is more important than verb, adjective and adverbial word; In some scenarios, whether the difference of a word w adjacent word, word w are that preset name can also influence the important of the word Property.But judgement of the features described above data for word importance is also different, that is to say, that is using above-mentioned various features When the importance of one word of data characterization, the weight of different characteristic should not make no exception, but should have different Weight.
Therefore, in this embodiment, submodule 803 is trained to predict that estimation obtains in other words using the mode of training pattern The optimal weight distribution for above-mentioned various features data, such as, it can be used a kind of simple efficient, actual in machine learning The probable value is estimated using very extensive Logic Regression Models, i.e. training unit 1001 is based on the characteristic and instruction Practice term data, obtain a feature weight forecast model using Logic Regression Models, further predicting unit 1002 uses this Individual feature weight forecast model can be obtained by one group of feature weight value corresponding with characteristic, as a rule pass through optimization Algorithm can obtain one group corresponding with the characteristic optimal feature weight value.
In an optional implementation of the present embodiment, the calculating sub module 804 can be configured to, with following formula base The probable value p (w) that candidate's word w is default word is calculated in the weighted value of the characteristic:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
In an optional implementation of the present embodiment, the confirmation module 603 can be configured as the probable value Word more than predetermined probabilities threshold value is defined as target word.
In this embodiment, candidate's word w is that the probable value p (w) of more important default word is more than predetermined probabilities The word of threshold value be regarded as importance it is relatively strong, for the predetermined registration operations such as retrieval relatively effective word, Therefore, such word can participate in the predetermined registration operations such as follow-up retrieval as target word, to improve retrieval hit rate.
Wherein, probability threshold value can be configured according to the needs of practical application, and the disclosure is not made specifically to its specific value Limit.
In an optional implementation of the present embodiment, described device also includes:
Execution module, it is configured as performing predetermined registration operation to the target word, wherein, the predetermined registration operation includes:Inspection One or more in rope, search, matching.
The disclosure also discloses a kind of electronic equipment, and Figure 11 shows the knot of the electronic equipment according to the embodiment of the disclosure one Structure block diagram, as shown in figure 11, the electronic equipment 1100 include memory 1101 and processor 1102;Wherein,
The memory 1101 is used to store one or more computer instruction, wherein, one or more computer Instruction is performed by the processor 1102 to realize:
Obtain candidate's word collection;
Calculate the probable value that candidate's word concentrates word to be default word;
The word that the probable value is met to preparatory condition confirms as target word.
One or more computer instruction can be also performed by the processor 1102 to realize:
Acquisition candidate's word collection, including:
Obtain input character string;
The input character string is split, obtains candidate's word;
Candidate's word collection is generated based on obtained candidate's word.
The probable value for calculating candidate's word and concentrating word to be default word, including:
Determine characteristic;
Obtain training term data;
Train to obtain the weighted value of characteristic based on the characteristic and training term data;
Weighted value based on the characteristic calculates the probable value that candidate's word is default word.
The characteristic includes:The number that word w occurs in current input character string, word w is in default history Between the number that occurs in input character string in section, word w adjacent word, word w part of speech, the part of speech of adjacent word, word w Whether it is one or more in preset name.
The training term data includes positive sample word and negative sample word.
Described obtain trains term data, including:
Predetermined registration operation is performed to word, obtains predetermined registration operation data;
Calculate the matching degree between the word and predetermined registration operation data;
Word of the matching degree greater than or equal to preset matching degree threshold value is defined as positive sample word, by matching degree less than pre- If the word of matching degree threshold value is defined as negative sample word.
It is described to train to obtain the weighted value of characteristic based on the characteristic and training term data, including:
It is trained based on the characteristic and training term data, obtains feature weight forecast model;
Weight corresponding to the characteristic is predicted based on the feature weight forecast model.
The probable value p (w) that candidate's word w is default word is calculated using weighted value of the following formula based on the characteristic:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
The word that the probable value is met to preparatory condition confirms as target word, including:
The word that the probable value is more than to predetermined probabilities threshold value is defined as target word.
Also include:
Predetermined registration operation is performed to the target word.
Figure 12 is suitable to be used for realizing that the structure of the computer system of the semantic analysis according to disclosure embodiment is shown It is intended to.
As shown in figure 12, computer system 1200 includes CPU (CPU) 1201, its can according to be stored in only Read the program in memory (ROM) 1202 or be loaded into from storage part 1208 in random access storage device (RAM) 1203 Program and perform the various processing in the embodiment shown in above-mentioned Fig. 1-5.In RAM1203, also it is stored with system 1200 and grasps Various programs and data needed for making.CPU1201, ROM1202 and RAM1203 are connected with each other by bus 1204.Input/defeated Go out (I/O) interface 1205 and be also connected to bus 1204.
I/O interfaces 1205 are connected to lower component:Importation 1206 including keyboard, mouse etc.;Including such as negative electrode The output par, c 1207 of ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part including hard disk etc. 1208;And the communications portion 1209 of the NIC including LAN card, modem etc..Communications portion 1209 passes through Communication process is performed by the network of such as internet.Driver 1210 is also according to needing to be connected to I/O interfaces 1205.It is detachable to be situated between Matter 1211, such as disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 1210, so as to Storage part 1208 is mounted into as needed in the computer program read from it.
Especially, according to embodiment of the present disclosure, computer is may be implemented as above with reference to Fig. 1-5 methods described Software program.For example, embodiment of the present disclosure includes a kind of computer program product, it includes being tangibly embodied in and its can The computer program on medium is read, the computer program includes the program code for the semantic analysis for being used to perform Fig. 1-5. In such embodiment, the computer program can be downloaded and installed by communications portion 1209 from network, and/or It is mounted from detachable media 1211.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system, method and computer of the various embodiments of the disclosure Architectural framework in the cards, function and the operation of program product.At this point, each square frame in course diagram or block diagram can be with A part for a module, program segment or code is represented, a part for the module, program segment or code includes one or more For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart, Ke Yiyong Function as defined in execution or the special hardware based system of operation are realized, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit or module involved in disclosure embodiment can be realized by way of software, also may be used Realized in a manner of by hardware.Described unit or module can also be set within a processor, these units or module Title do not form restriction to the unit or module in itself under certain conditions.
As on the other hand, the disclosure additionally provides a kind of computer-readable recording medium, the computer-readable storage medium Matter can be the computer-readable recording medium included in device described in above-mentioned embodiment;Can also be individualism, Without the computer-readable recording medium in supplying equipment.Computer-readable recording medium storage has one or more than one journey Sequence, described program is used for performing by one or more than one processor is described in disclosed method.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the disclosure, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from the inventive concept, carried out by above-mentioned technical characteristic or its equivalent feature The other technical schemes for being combined and being formed.Such as features described above has similar work(with the (but not limited to) disclosed in the disclosure The technical scheme that the technical characteristic of energy is replaced mutually and formed.
The present disclosure discloses A1, a kind of semantic analysis, methods described includes:Obtain candidate's word collection;Described in calculating Candidate's word concentrates the probable value that word is default word;The word that the probable value is met to preparatory condition confirms as target word Language.A2, the method according to A1, acquisition candidate's word collection, including:Obtain input character string;To the input character String is split, and obtains candidate's word;Candidate's word collection is generated based on obtained candidate's word.A3, the method according to A1, The probable value for calculating candidate's word and concentrating word to be default word, including:Determine characteristic;Obtain training word Data;Train to obtain the weighted value of characteristic based on the characteristic and training term data;Based on the characteristic Weighted value calculate the probable value that candidate word is default word.A4, the method according to A3, the characteristic include: The number that word w occurs in current input character string, word w input in character string time occurred in default historical time section Whether number, word w adjacent word, word w part of speech, the part of speech of adjacent word, word w are one kind or more in preset name Kind.A5, the method according to A3, the training term data include positive sample word and negative sample word.A6, according to A5 institutes The method stated, described obtain train term data, including:Predetermined registration operation is performed to word, obtains predetermined registration operation data;Calculate institute Matching degree between predicate language and predetermined registration operation data;Word of the matching degree greater than or equal to preset matching degree threshold value is defined as Positive sample word, word of the matching degree less than preset matching degree threshold value is defined as negative sample word.A7, the side according to A3 Method, it is described to train to obtain the weighted value of characteristic based on the characteristic and training term data, including:Based on the spy Sign data and training term data are trained, and obtain feature weight forecast model;It is pre- based on the feature weight forecast model Survey weight corresponding to the characteristic.A8, the method according to A3, utilize weighted value of the following formula based on the characteristic Calculate the probable value p (w) that candidate's word w is default word:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.A9, basis Method described in A1, the word that the probable value is met to preparatory condition confirm as target word, including:By the probability The word that value is more than predetermined probabilities threshold value is defined as target word.A10, the method according to A1, in addition to:To the target Word performs predetermined registration operation.
The present disclosure discloses B11, a kind of semantic analysis device, described device includes:Acquisition module, it is configured as obtaining and waits Select word collection;Computing module, it is configured as calculating the probable value that candidate's word concentrates word to be default word;Confirm mould Block, the word for being configured as meeting the probable value preparatory condition confirm as target word.B12, the dress according to B11 Put, the acquisition module includes:First acquisition submodule, it is configured as obtaining input character string;Split submodule, be configured as The input character string is split, obtains candidate's word;Submodule is generated, is configured as based on obtained candidate's word life Into candidate's word collection.B13, the device according to B11, the computing module include:Determination sub-module, it is configured to determine that spy Levy data;Second acquisition submodule, it is configured as obtaining training term data;Submodule is trained, is configured as being based on the spy Sign data and training term data train to obtain the weighted value of characteristic;Calculating sub module, it is configured as being based on the feature The weighted value of data calculates the probable value that candidate's word is default word.B14, the device according to B13, the characteristic Including:The number that word w occurs in current input character string, word w input in character string in default historical time section Existing number, word w adjacent word, word w part of speech, the part of speech of adjacent word, word w whether be in preset name one Kind is a variety of.B15, the device according to B13, the training term data include positive sample word and negative sample word. B16, the device according to B15, second acquisition submodule include:Execution unit, it is configured as performing word and presets Operation, obtains predetermined registration operation data;Computing unit, it is configured as calculating the matching between the word and predetermined registration operation data Degree;Determining unit, it is configured as word of the matching degree greater than or equal to preset matching degree threshold value being defined as positive sample word, will Matching degree is defined as negative sample word less than the word of preset matching degree threshold value.B17, the device according to B13, the training Submodule includes:Training unit, it is configured as based on the characteristic and trains term data to be trained, obtains feature power Weight forecast model;Predicting unit, it is configured as predicting based on the feature weight forecast model and is weighed corresponding to the characteristic Weight.B18, the device according to B13, the calculating sub module are configured to, with weight of the following formula based on the characteristic Value calculates the probable value p (w) that candidate's word w is default word:
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.B19, root It is described to confirm that module is configured as word of the probable value more than predetermined probabilities threshold value being defined as according to the device described in B11 Target word.B20, the device according to B11, in addition to:Execution module, it is configured as performing the target word and presets Operation.
The present disclosure discloses C21, a kind of electronic equipment, including memory and processor;Wherein, the memory is used to deposit One or more computer instruction is stored up, wherein, one or more computer instruction is by the computing device to realize such as Method described in any one of A1-A10.
The disclosure also discloses D22, a kind of computer-readable recording medium, is stored thereon with computer instruction, the calculating The method as described in any one of A1-A10 is realized in machine instruction when being executed by processor.

Claims (10)

1. a kind of semantic analysis, it is characterised in that methods described includes:
Obtain candidate's word collection;
Calculate the probable value that candidate's word concentrates word to be default word;
The word that the probable value is met to preparatory condition confirms as target word.
2. according to the method for claim 1, it is characterised in that acquisition candidate's word collection, including:
Obtain input character string;
The input character string is split, obtains candidate's word;
Candidate's word collection is generated based on obtained candidate's word.
3. according to the method for claim 1, it is characterised in that described to calculate candidate's word concentration word as default word The probable value of language, including:
Determine characteristic;
Obtain training term data;
Train to obtain the weighted value of characteristic based on the characteristic and training term data;
Weighted value based on the characteristic calculates the probable value that candidate's word is default word.
4. according to the method for claim 3, it is characterised in that the characteristic includes:Word w is in current input character The number occurred in string, the number that word w occurs in default historical time section in input character string, word w adjacent word, Whether word w part of speech, the part of speech of adjacent word, word w are one or more in preset name.
5. according to the method for claim 3, it is characterised in that the training term data includes positive sample word and negative sample This word.
6. according to the method for claim 3, it is characterised in that described based on the characteristic and training term data instruction The weighted value of characteristic is got, including:
It is trained based on the characteristic and training term data, obtains feature weight forecast model;
Weight corresponding to the characteristic is predicted based on the feature weight forecast model.
7. according to the method for claim 3, it is characterised in that calculated using weighted value of the following formula based on the characteristic Candidate's word w is the probable value p (w) of default word:
<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mo>&amp;Sigma;</mo> <mi>i</mi> </msub> <msub> <mi>&amp;lambda;</mi> <mi>i</mi> </msub> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, fiRepresent the ith feature in characteristic, λiRepresent ith feature fiCorresponding weighted value.
8. a kind of semantic analysis device, it is characterised in that described device includes:
Acquisition module, it is configured as obtaining candidate's word collection;
Computing module, it is configured as calculating the probable value that candidate's word concentrates word to be default word;
Confirm module, the word for being configured as meeting the probable value preparatory condition confirms as target word.
9. a kind of electronic equipment, it is characterised in that including memory and processor;Wherein,
The memory is used to store one or more computer instruction, wherein, one or more computer instruction is by institute Computing device is stated to realize the method as described in claim any one of 1-7.
10. a kind of computer-readable recording medium, is stored thereon with computer instruction, it is characterised in that the computer instruction quilt The method as described in claim any one of 1-7 is realized during computing device.
CN201711230879.9A 2017-11-29 2017-11-29 Semantic analysis, device, electronic equipment and computer-readable recording medium Pending CN107885879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711230879.9A CN107885879A (en) 2017-11-29 2017-11-29 Semantic analysis, device, electronic equipment and computer-readable recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711230879.9A CN107885879A (en) 2017-11-29 2017-11-29 Semantic analysis, device, electronic equipment and computer-readable recording medium

Publications (1)

Publication Number Publication Date
CN107885879A true CN107885879A (en) 2018-04-06

Family

ID=61776158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711230879.9A Pending CN107885879A (en) 2017-11-29 2017-11-29 Semantic analysis, device, electronic equipment and computer-readable recording medium

Country Status (1)

Country Link
CN (1) CN107885879A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947902A (en) * 2019-03-06 2019-06-28 腾讯科技(深圳)有限公司 A kind of data query method, apparatus and readable medium
CN111079439A (en) * 2019-12-11 2020-04-28 拉扎斯网络科技(上海)有限公司 Abnormal information identification method and device, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8326861B1 (en) * 2010-06-23 2012-12-04 Google Inc. Personalized term importance evaluation in queries
CN104376065A (en) * 2014-11-05 2015-02-25 百度在线网络技术(北京)有限公司 Determination method and device for importance degree of search word
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value
CN104866496A (en) * 2014-02-22 2015-08-26 腾讯科技(深圳)有限公司 Method and device for determining morpheme significance analysis model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8326861B1 (en) * 2010-06-23 2012-12-04 Google Inc. Personalized term importance evaluation in queries
CN104866496A (en) * 2014-02-22 2015-08-26 腾讯科技(深圳)有限公司 Method and device for determining morpheme significance analysis model
CN104376065A (en) * 2014-11-05 2015-02-25 百度在线网络技术(北京)有限公司 Determination method and device for importance degree of search word
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAWEI HAN等: "《数据挖掘概念与技术》", 31 August 2001, 机械工业出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947902A (en) * 2019-03-06 2019-06-28 腾讯科技(深圳)有限公司 A kind of data query method, apparatus and readable medium
CN109947902B (en) * 2019-03-06 2021-03-26 腾讯科技(深圳)有限公司 Data query method and device and readable medium
CN111079439A (en) * 2019-12-11 2020-04-28 拉扎斯网络科技(上海)有限公司 Abnormal information identification method and device, electronic equipment and computer storage medium

Similar Documents

Publication Publication Date Title
US10936626B1 (en) Database and data processing system for use with a network-based personal genetics services platform
CN110427466B (en) Training method and device for neural network model for question-answer matching
Srdjevic et al. Synthesis of individual best local priority vectors in AHP-group decision making
JP6232607B1 (en) Patent requirement conformity prediction device and patent requirement conformity prediction program
CN107369052A (en) User&#39;s registration behavior prediction method, apparatus and electronic equipment
CN107239564A (en) A kind of text label based on supervision topic model recommends method
CN109784959A (en) A kind of target user&#39;s prediction technique, device, background server and storage medium
WO2017071369A1 (en) Method and device for predicting user unsubscription
CN112148986B (en) Top-N service re-recommendation method and system based on crowdsourcing
CN107436916A (en) The method and device of intelligent prompt answer
CN111181757A (en) Information security risk prediction method and device, computing equipment and storage medium
CN107885879A (en) Semantic analysis, device, electronic equipment and computer-readable recording medium
CN113407854A (en) Application recommendation method, device and equipment and computer readable storage medium
CN107992570A (en) Character string method for digging, device, electronic equipment and computer-readable recording medium
CN112463989A (en) Knowledge graph-based information acquisition method and system
CN107844584A (en) Usage mining method, apparatus, electronic equipment and computer-readable recording medium
CN111666513A (en) Page processing method and device, electronic equipment and readable storage medium
CN105931055A (en) Service provider feature modeling method for crowdsourcing platform
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN112102116A (en) Input prediction method, system, equipment and storage medium based on tourism session
CN108595415B (en) Law differentiation judgment method and device, computer equipment and storage medium
CN110309513A (en) A kind of method and apparatus of context dependent analysis
CN109446518B (en) Decoding method and decoder for language model
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium
CN112396498A (en) Commodity sales promotion method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180406

RJ01 Rejection of invention patent application after publication