WO2020107864A1 - 信息处理方法、装置、服务设备及计算机可读存储介质 - Google Patents

信息处理方法、装置、服务设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2020107864A1
WO2020107864A1 PCT/CN2019/091387 CN2019091387W WO2020107864A1 WO 2020107864 A1 WO2020107864 A1 WO 2020107864A1 CN 2019091387 W CN2019091387 W CN 2019091387W WO 2020107864 A1 WO2020107864 A1 WO 2020107864A1
Authority
WO
WIPO (PCT)
Prior art keywords
discriminator
text
analysis model
word
text information
Prior art date
Application number
PCT/CN2019/091387
Other languages
English (en)
French (fr)
Inventor
吴斌
蒋欣
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020107864A1 publication Critical patent/WO2020107864A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • This application relates to the field of machine learning, and in particular to an information processing method, device, service device, and computer-readable storage medium.
  • the word weight of each word in the text information can be used to evaluate the importance of the word in the text information. When it is applied to a search system, question answering system or other systems, by setting the appropriate word weight for the word in the text information, you can get More accurate processing results.
  • the term frequency-inverse text frequency index (Term-Frequency–Inverse Document Frequency, TF-IDF) is mainly used to calculate word weights.
  • the main idea of the TF-IDF algorithm is: if a word appears in a document with a high frequency, and It rarely appears in other documents, it is considered that the word has a good ability to distinguish between categories, that is, the word has a high weight.
  • the disadvantage of the TF-IDF algorithm is that the word weight of a word is mainly determined by the number of documents that contain the word in the document collection. The correlation between the word weight of the word and the text information containing the word is low. Word weights do not accurately reflect the importance of the word in the text information, making the accuracy of word weights low. Therefore, how to improve the accuracy of word weights has become an urgent technical problem to be solved
  • Embodiments of the present application provide an information processing method, an apparatus for implementing the method, a service device, and a computer-readable storage medium, which can determine the word weight of the feature word in the text information based on the output result obtained by analyzing and identifying the text information.
  • the value makes the correlation between the word weight value of the feature word of the text information and the output result obtained by analyzing and identifying the text information higher, which is beneficial to improve the accuracy of the word weight value of the feature word.
  • an embodiment of the present application provides an information processing method, which includes: acquiring text information; calling a text analysis model to analyze and identify text information, and obtaining an output result of the text analysis model; according to the output result, acquiring text
  • the analysis model uses the feature weight value for each feature word in the text information during analysis and recognition; and determines the word weight value for each feature word in the text information based on the acquired feature weight value.
  • the word weight value of each feature word in the text information is determined based on the output result obtained by analyzing and identifying the text information, so that the word weight value of each feature word of the text information and the text information can be analyzed
  • the correlation between the output results of the recognition is high, that is, the correlation between the word weight value of each feature word of the text information and the real user intention corresponding to the text information is high.
  • the feature words can be improved The accuracy of the word weight value.
  • the text analysis model includes a discriminator, and the text analysis model analyzes and recognizes the text information through the discriminator; according to the output result, the text analysis model is obtained for each feature word in the text information during analysis and recognition.
  • the specific implementation method of the feature weight value used may be: determine the target discriminator from the discriminator included in the text analysis model according to the output result, and obtain the target discriminator used for each feature word in the text information during analysis and recognition Feature weight value.
  • the target discriminator is determined from the discriminator included in the text analysis model according to the output result of the text analysis model, instead of randomly determining the target discriminator from the text analysis model, it is possible to improve the determination based on the target discriminator The accuracy of the word weight value of the characteristic word.
  • the foregoing text analysis model may be a classification model, and the text analysis model may include multiple discriminators, each of which corresponds to a classification category; the target is determined from the discriminators included in the text analysis model according to the output result
  • the specific implementation of the discriminator may be: determining the discriminator corresponding to the target classification category included in the output result of the text analysis model as the target discriminator, where the target classification category is based on the text information of each discriminator of the text analysis model. The identification result obtained after analysis is determined.
  • the target classification category can be used to characterize the real user's intention of the text information, by determining the discriminator corresponding to the target classification category as the target discriminator, and then based on the target discriminator when analyzing and identifying each of the text information
  • the feature weight value used by the feature word determines the word weight value of each feature word in the text information, which is beneficial to improve the accuracy of the word weight value.
  • the text analysis model may include multiple discriminators, and the recognition result of each discriminator for analysis and recognition may be a probability value.
  • the foregoing output result may include a target probability value, and the target probability value may be a text analysis model
  • the maximum probability value among the probability values output by the various discriminators; the specific implementation of determining the target discriminator from the discriminators included in the text analysis model according to the output result can be: the discriminator outputting the target probability value is determined as the target discriminator Device.
  • the text analysis model may include multiple discriminators, and each discriminator may correspond to an identifier; the specific implementation of determining the target discriminator from the discriminators included in the text analysis model according to the output result may be: The discriminator corresponding to the target identifier included in the output result of the text analysis model is determined as the target discriminator, where the target identifier is determined according to the recognition result obtained by analyzing and recognizing the text information by each discriminator of the text analysis model.
  • a specific implementation method for determining the word weight value of each feature word in the text information based on the acquired feature weight value may be: use the feature weight value used for each feature word in the text information as The word weight value of the corresponding characteristic word in the text information.
  • each discriminator included in the text analysis model can be used to identify text information of different classification categories, the same feature word in the text information of different classification categories, and features in different discriminators included in the text analysis model
  • the weight value can be different.
  • the discriminator of the text analysis model analyzes and recognizes the text information of different classification categories through the feature weight value.
  • the same feature word in the text information of different classification categories different discriminators included in the text analysis model
  • the different feature weight values in the model allow different discriminators of the text analysis model to accurately identify the classification category to which the text information belongs based on the different feature weight values.
  • a specific implementation method for determining the word weight value of each feature word in the text information based on the acquired feature weight values may be: performing word segmentation processing on the text information to obtain each feature word of the text information; Each feature word of the text information is used as the input of the text analysis model to obtain the output result of the text analysis model.
  • the method may further include: obtaining training sample data, the training sample data includes historical text information and annotation information; and training the preset model based on the historical text information and annotation information to obtain the aforementioned text analysis model .
  • the foregoing text information may be query information
  • the historical text information may be historical query information
  • the annotation information may be determined according to user operation data of query results obtained by querying the historical query information.
  • the historical query information is the real query information input by the user in the past
  • the user operation data is data obtained according to the user's real operation, that is, the text analysis model is trained based on real user feedback data.
  • the output result obtained by analyzing and identifying the query information by the text analysis model can be more in line with the real user's intention corresponding to the query information.
  • the word weight value of the feature word obtained based on the output result Can more objectively reflect the user's real search needs.
  • the number of query results obtained by querying historical query information may be multiple, and user operation data may include query results obtained by querying historical query information and the number of selections of each query result, and each query The classification category to which the result belongs; based on historical text information and annotation information, the preset model is trained to obtain the foregoing text analysis model.
  • the specific implementation manner may be: input historical query information as training data into the preset model to obtain the training result ; Perform parameter optimization on the preset model based on the training results and annotation information to obtain the aforementioned text analysis model.
  • the annotation information may be the first classification category determined according to the aforementioned user operation data; wherein, the first classification category may be a historical query Among the query results obtained by the information query, the classification category to which the query result with the highest number of selections belongs, or the total number of selected selections for the query results under the first classification category is the largest.
  • the number of selections for each query result obtained by querying historical query information can be automatically detected, and the classification category to which the query result with the highest number of selections in the query results belongs is used as labeling information, that is, the training data and Labeling information without manual labeling can effectively reduce the training cost of the model; in addition, the service equipment can automatically optimize the model, thereby effectively improving the prediction accuracy.
  • the method may further include: based on the text information and the word weight value of each feature word of the text information Information processing.
  • the information processing is performed based on the text information and the word weight value of each characteristic word of the text information, and an information processing result more in line with the user's intention can be obtained.
  • the text information may be query information
  • a specific implementation manner of performing information processing based on the text information and the word weight value of each characteristic word of the text information may be: based on the text information and the text information
  • the word weight value of each feature word is searched to obtain the first query result of the text information, and the first query result is output; or, the specific implementation of information processing based on the text information and the word weight value of each feature word of the text information
  • the method may be: searching for the second query result according to the text information, and sorting the second query result based on the word weight value of each feature word of the text information, and outputting the sorted second query result.
  • searching based on the word weight values of feature words can effectively improve the accuracy of the first query result recalled by the search, and can make the first query result more consistent User search requirements; in addition, based on the word weight values of each feature word of the text information, the second query results are sorted, and the second query results that better meet the user's search needs can be displayed in front of the user, which can effectively improve the search effect .
  • a specific implementation manner of performing information processing based on the text information and the word weight value of each feature word of the text information may be: based on the word weight value of each feature word of the text information, in the text information Among the characteristic words of, the core words and/or invalid words are identified.
  • the core word is the feature word that best represents the real user's intention corresponding to the text information.
  • based on the core word Searching can avoid the impact of other feature words on the query results, and the recalled query results do not meet the real user's intention corresponding to the text information, which is conducive to improving the search effect.
  • the invalid words after the invalid words are determined, they can be based on the feature words of the text information Search for other feature words other than invalid words. Searching for other feature words other than invalid words in feature words based on text information can reduce the recall of invalid content and improve the accuracy of recalled content.
  • an embodiment of the present application provides an information processing apparatus that has a function of implementing the method described in the first aspect.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • an embodiment of the present application provides a service device.
  • the service device includes a memory and a processor.
  • the memory stores program instructions.
  • the processor is connected to the memory through a bus.
  • the processor calls the program instructions stored in the memory to enable the service.
  • the device performs the method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium for storing computer program instructions used by the information processing apparatus according to the second aspect, which includes instructions for executing the program according to the first aspect.
  • an embodiment of the present application provides a computer program product.
  • the program product includes a program, and when the program is executed, the method according to the first aspect described above is implemented.
  • FIG. 1 is a schematic structural diagram of a communication system disclosed in an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an information processing method disclosed in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another information processing method disclosed in an embodiment of the present application.
  • FIG. 3a is a schematic diagram of a scenario for acquiring a target classification category disclosed in an embodiment of the present application
  • 3b is a schematic diagram of a scenario for obtaining a target probability value disclosed in an embodiment of the present application.
  • FIG. 3c is a schematic diagram of a scenario of acquiring a target identifier disclosed in an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another information processing method disclosed in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an information processing device disclosed in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a service device disclosed in an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a communication system disclosed in an embodiment of the present application.
  • the communication system includes a terminal device 101, a service device 102, and multiple data servers 103.
  • the terminal device 101 may be a user equipment (user equipment, UE), a remote terminal, a mobile terminal, a wireless communication device, a user device, or the like.
  • the user can input the query language (such as a family nursery song video) in the search box displayed on the terminal device 101 through the input device of the terminal device 101, and then click the search button, so that the terminal device 101 detects the query button and clicks the query language through the network Sent to the service device 102 (step S101); the service device 102 can be used to analyze and identify the query language, and obtain the word weight value of each feature word of the query language based on the analysis and recognition result, and then based on the word weight value of each feature word Acquire search results from multiple data servers 103 (step S102); and then send the search results obtained by the search to the terminal device 101, so that the terminal device 101 outputs the search results on the display screen for the user to select according to his needs.
  • the query language such as a family nursery song video
  • the service device 102 may be composed of a processor, a memory, and a network interface.
  • the service device 102 may be a terminal device or a server.
  • the service device 102 may be a search engine server.
  • the steps performed by the service device 102 in FIG. 1 can be replaced by the terminal device 101, that is, the terminal device 101 can analyze and identify the query term, and obtain various characteristics of the query term based on the analysis and recognition results The word weight values of the words, and then obtain the search results from the multiple data servers 103 based on the word weight values of the respective feature words.
  • the steps performed by the terminal device 101 in FIG. 1 may be replaced by the service device 102, that is, the query received by the service device 102 in FIG. 1 may be the service device 102 according to the user’s Enter the operation.
  • FIG. 2 is a schematic flowchart of an information processing method provided by an embodiment of the present application. The method may be applied to a search system or a question and answer system. The method may include But not limited to the following steps:
  • Step S201 The service device obtains text information.
  • the text information can be a word or a sentence composed of multiple words.
  • the text information may be a query term input by a user during a query search, and the query term may be entered in text mode or in voice If the query language is input by voice, the query language in voice format needs to be converted to text format.
  • the text information may be a question entered by the user when asked, and the question may be entered in text mode or in voice mode. When the question is input by voice, the question in voice format needs to be converted to text format.
  • the text information is used as the query term for illustration.
  • the text information may be input by the user in the terminal device and sent by the terminal device to the service device, or the text information may also be input by the user in the service device.
  • the service device may be a terminal device or a server.
  • Step S202 The service device invokes the text analysis model to analyze and recognize the text information, and obtains the output result of the text analysis model.
  • the text analysis model may be a classification model or a regression model in the machine learning model.
  • the text analysis model may correspond to one or more classification categories.
  • the text analysis model can be used to identify whether the real user intention of the text information belongs to the classification category corresponding to the text analysis model, and the output result of the text analysis model can be used to indicate whether According to the feature weight value used for each feature word in the text information in the analysis and recognition of the text analysis model, the word weight value of each feature word in the text information is determined.
  • the text analysis model may be trained based on a large amount of real historical text information and classification categories fed back by actual users who input the historical text information. Therefore, the text analysis model may be used to identify the text information Real user intent. For example, when the historical text information is a query term input by a user during a query search, searching for the query term can obtain a query result, and the classification category of the user as an actual user feedback can be the category to which the query result selected by the user belongs .
  • the output result of the text analysis model is used to indicate that each feature word in the text information is analyzed and identified according to the text analysis model
  • the feature weight value used determines the word weight value of each feature word of the text information.
  • the text analysis model recognizes that the real user intention of "family children's song video” belongs to “children's song class” At this time, the output result of the text analysis model is used to instruct to determine the word of each feature word of the “family nursery song video” for the feature weight value used for each feature word in the “family nursery song video” when analyzing and identifying according to the text analysis model Weights.
  • the correlation between the word weight value of each feature word of the text information and the output result obtained by analyzing and identifying the text information is high, that is, the word weight value of each feature word of the text information is
  • the correlation between the real user intentions corresponding to the text information is high. In this way, the accuracy of the word weight values of the feature words can be improved.
  • the text analysis model can correspond to multiple classification categories, and the text analysis model can include multiple discriminators, where each discriminator can correspond to a classification category, and the text analysis model can use the discriminator to compare the text information. Perform analysis and recognition, and each discriminator can analyze and recognize text information through different feature weight values. At this time, the text analysis model can be used to identify which of the text analysis model's judgments the real user intention of the text information belongs to.
  • the classification category corresponding to the device.
  • the output result of the text analysis model can be used to indicate the feature weight value used for each feature word in the text information according to the discriminator 1 when analyzing and identifying, Determine the word weight value of each feature word of the text information, where the discriminator 1 is one of the discriminators included in the text analysis model.
  • the text analysis model when the text analysis model is a regression model, the text analysis model can correspond to a classification category, and the text analysis model can be used to analyze the real user intention of the text information belongs to the corresponding text analysis model
  • the classification category probability that is, the output result of the text analysis model may be a probability value.
  • the service device may obtain the text analysis model for each of the text information during analysis and recognition.
  • the feature weight value used by the feature word, and the word weight value of each feature word in the text information is determined based on the acquired feature weight value.
  • the first preset probability value threshold may be set by the service device by default, or may be determined by the service device according to a user's input operation, which is not limited in this embodiment of the present application.
  • the service device invokes the text analysis model to analyze and identify the text information and obtain the output result of the text analysis model.
  • the specific implementation manner may be: the service device performs word segmentation processing on the text information to obtain each of the text information. Feature words, and use each feature word of the text information as the input of the text analysis model to obtain the output result of the text analysis model. In this way, you only need to input each feature word of the text information into the text analysis model to obtain the output result of the text analysis model, and then obtain the word weight value of each feature word of the text information based on the output result.
  • the process is simple and efficient. When the number of feature words of the text information is multiple, it is only necessary to call the text analysis model once to obtain the word weight value of each feature word of the text information.
  • the service device performs word segmentation processing on the text information to obtain each feature word of the text information.
  • a specific implementation manner may be: the service device invokes a word segmentation algorithm to perform word segmentation processing on the text information to obtain each word segmentation of the text information And determine each obtained word segmentation as each characteristic word of the text information.
  • the word segmentation algorithm may include, but is not limited to, word segmentation algorithms based on string matching (such as forward maximum matching method, reverse maximum matching method, least segmentation, bidirectional maximum matching method, etc.), understanding-based word segmentation algorithm And the word segmentation algorithm based on statistics, this embodiment of the present application does not limit this.
  • Step S203 Based on the output result, the service device obtains the feature weight value used by the text analysis model for each feature word in the text information during analysis and recognition.
  • the text analysis model uses the feature weight value of each feature word in the text information to analyze and identify the text information. If the output result of the text analysis model indicates that each feature word in the text information is used for analysis and recognition according to the text analysis model The feature weight value of determines the word weight value of each feature word of the text information, then the service device can obtain the feature weight value used by the text analysis model for each feature word in the text information during analysis and recognition, and based on the acquired feature weights The value determines the word weight value of each feature word in the text information.
  • the feature weight value used by the text analysis model for each feature word in the text information during analysis and recognition may be determined during the training process, or may be set according to empirical values, which is not limited in the embodiments of the present application.
  • Step S204 The service device determines the word weight value of each feature word in the text information based on the obtained feature weight value. Specifically, the service device may use the feature weight value used by the text analysis model for each feature word in the text information as the word weight value of the corresponding feature word in the text information. For example, when the text information is "family children's song video", and each feature word in "family children's song video” is “family", “children's song” and “video”, the service device can target the text analysis model during analysis and recognition The feature weight values used by the feature words "family”, “children's song” and “video” are used as the word weight values of "family", “children's song” and “video” respectively.
  • the traditional method of using the machine learning model is to use the feature weight value used by the text analysis model as the machine learning model as a parameter in the analysis and recognition process, and then use the output result of the text analysis model as the final result.
  • the embodiments of this application directly
  • the parameters of the text analysis model that is, feature weight values
  • the word weight values of the feature words of the text information are essentially different from the traditional methods of using machine learning models.
  • the number of feature words in the text information may be one or more, each feature word corresponds to a feature weight value in the text analysis model, and the service device is based on the acquired feature weight values,
  • a specific implementation method for determining the word weight value of each feature word in the text information may be: the service device performs normalization processing on each acquired feature weight value, and uses the normalized feature weight value as the corresponding feature word Word weight value.
  • the word weight value of each feature word in the text information is determined, so that the word weight value of each feature word of the text information and the text
  • the correlation between the output results obtained by analyzing and identifying the information is high, that is, the correlation between the word weight value of each feature word of the text information and the real user intention corresponding to the text information is high, which is helpful to improve the feature word
  • the accuracy of the word weight value is provided.
  • FIG. 3 is a schematic flowchart of another information processing method provided by an embodiment of the present application.
  • the method may be applied to a search system or a question and answer system.
  • the method may include but is not limited to the following steps:
  • Step S301 The service device obtains text information. It should be noted that, for the execution process of step S301, reference may be made to the specific description of step S201 in FIG. 2, and details are not described herein.
  • Step S302 The service device invokes a text analysis model to analyze and recognize the text information, and obtains the output result of the text analysis model.
  • the text analysis model includes a discriminator, and the text analysis model analyzes and recognizes the text information through the discriminator.
  • the text analysis model may include one or more discriminators.
  • the output result of the text analysis model may be used to indicate whether to determine the discriminator in the text analysis model as the target discriminator;
  • the target discriminator can be determined from the multiple discriminators in the text analysis model according to the output result of the text analysis model.
  • the text analysis model may be a classification model, each discriminator in the text analysis model may correspond to a classification category, and each discriminator in the text analysis model may analyze the text information and the recognition result may be A probability value that can be used to characterize the probability that the text information belongs to the classification category corresponding to the discriminator that outputs the probability value.
  • the output result of the text analysis model may be a probability value obtained by analyzing and identifying the text information by the discriminator in the text analysis model, if the probability value is greater than the second
  • the service device can determine the discriminator in the text analysis model as the target discriminator.
  • the second preset probability value threshold may be set by the service device by default, or may be determined by the service device according to a user's input operation, which is not limited in this embodiment of the present application.
  • the output result of the text analysis model may include the target classification category.
  • the service device may be based on the classification category and The corresponding relationship between the discriminators determines the discriminator corresponding to the target classification category as the target discriminator.
  • different discriminators in the text analysis model correspond to different classification categories.
  • the target classification category can be determined according to the recognition result obtained by analyzing the text information by each discriminator of the text analysis model, and each discriminator can determine the text information.
  • the recognition result obtained after the analysis can be a probability value.
  • the text analysis model includes three discriminators (discriminator 1, discriminator 2, and discriminator 3), discriminator 1, discriminator 2, and discriminator Device 3 corresponds to tools, learning and children's songs, and the three discriminators analyze and recognize the text information "family children's songs video".
  • the probability values are 0.1, 0.2 and 0.95
  • the output of the text analysis model can be It includes a category corresponding to the discriminator outputting the maximum probability value of 0.95, that is, the target classification category may be a children's song category.
  • the service device may use the discriminator 3 corresponding to the children's song category as the target discriminator.
  • the target classification category can be used to characterize the real user intent of the text information, by determining the discriminator corresponding to the target classification category as the target discriminator, and then based on the target discriminator used for each feature word in the text information when analyzing and identifying The feature weight value of, determines the word weight value of each feature word in the text information, which is helpful to improve the accuracy of the word weight value.
  • the output result of the text analysis model may include the target probability value.
  • the service device may determine the discriminator that outputs the target probability value as the target discriminator .
  • different discriminators in the text analysis model analyze and recognize the text information with different probability values, and the target probability value may be the maximum probability value among the probability values output by each discriminator.
  • the output of the text analysis model can include the maximum probability value among the three probability values, that is, the target probability value can be 0.95.
  • the service The device can use the discriminator 3 that outputs a target probability value of 0.95 as the target discriminator. By using the discriminator outputting the maximum probability value as the target discriminator, the accuracy of the determined target discriminator can be improved.
  • the output result of the text analysis model may include a target identifier.
  • the service device may determine the discriminator corresponding to the target identifier as the target discriminator.
  • the target identifier is used to uniquely identify a discriminator.
  • the target identifier may be determined according to the recognition result obtained by analyzing the text information by each discriminator of the text analysis model, and obtained by analyzing the text information by each discriminator
  • the recognition result can be a probability value.
  • the output result of the text analysis model can include the maximum output probability of 0.95.
  • the identifier of the discriminator that is, the target identifier may be the identifier 3.
  • the service device may use the discriminator 3 corresponding to the identifier 3 as the target discriminator.
  • Step S303 The service device determines the target discriminator from the discriminator included in the text analysis model according to the output result, and obtains the feature weight value used by the target discriminator for each feature word in the text information during analysis and recognition. Specifically, if the output result includes the target classification category, the service device may determine the discriminator corresponding to the target classification category as the target discriminator based on the correspondence between the classification category and the discriminator; if the output result includes the target probability value, Then, the service device may determine the discriminator outputting the target probability value as the target discriminator according to the probability value output by each discriminator; if the output result includes the target identification, the service device may be based on the correspondence between the identification and the discriminator, The discriminator corresponding to the target identifier is determined as the target discriminator.
  • the target discriminator analyzes and recognizes the text information based on the feature weight value used for each feature word in the text information.
  • the feature weight value used for each feature word in the text information can be The target discriminator is determined during the training process.
  • different discriminators may use different feature weight values for each feature word in the text information when analyzing and identifying the text information, or different discriminators
  • the device may use different feature weight values for some feature words in the text information, and may use the same feature weight values for another part of feature words in the text information. This is not limited.
  • the embodiment of the present application determines the target discriminator from the discriminator included in the text analysis model according to the output result of the text analysis model, rather than randomly determining the target discriminator from the text analysis model, which is beneficial to improve the determination based on the target discriminator The accuracy of the word weight value of the feature word.
  • Step S304 The service device uses the feature weight value used by the target discriminator for each feature word in the text information during analysis and recognition as the word weight value of the corresponding feature word in the text information.
  • the service device may record the feature weight values used by each discriminator for each feature word in the text information when analyzing and identifying the text information, so that after determining the target discriminator, The service device can extract the feature weight values used by the target discriminator for each feature word in the text information during analysis and recognition from the database, and then directly use the extracted feature weight values as the word weight values of the corresponding feature words.
  • the efficiency of determining the word weight value can be improved.
  • Step S305 The service device performs information processing based on the text information and the word weight value of each feature word of the text information. Based on the text information and the word weight values of the feature words of the text information, the information processing result can be obtained more in line with the user's intention.
  • the text information when the method shown in FIG. 3 is applied to a search system, the text information may be a query language, and the service device searches the query language according to the weight value of each feature word in the query language, which may cause The results of the search recall are more in line with user search needs.
  • the text information when the method shown in FIG. 3 is applied to a question and answer system, the text information may be a question, and the service device searches the question according to the word weight value of each feature word in the question, which may be more in line with the user. The intent answer.
  • the text information may be query information
  • the specific implementation manner of the service device performing information processing based on the text information and the word weight value of each feature word of the text information may be: the service device is based on the query information and the query information
  • the word weight value of each characteristic word of is searched to obtain the first query result of the query information, and the first query result is output.
  • the first query result may be obtained by the service device after performing weighted processing on the feature words of the query information according to the weight value of each feature word of the query information.
  • the text information may be query information
  • a specific implementation manner in which the service device performs information processing based on the text information and the word weight value of each feature word of the text information may also be: searching according to the query information to obtain the second query
  • the second query result is sorted based on the word weight value of each feature word of the query information, and the sorted second query result is output.
  • the second query result may be obtained by searching based on the word weight value of each feature word of the query information, or, if the service device is searching based on the query information, the feature word of the query information cannot be obtained through the text analysis model Word weight value, the second query result may be obtained by searching based on the default weight value set for each feature word in the query information.
  • the default weight value may be set by the service device according to a preset experience value, or may be a word weight value calculated according to the TF-IDF algorithm, which is not limited in this embodiment of the present application.
  • the service device sorts the second query result based on the word weight value of each feature word of the query information, and can preferentially output the second query result more in line with the user's search needs, which will soon be more in line with the user's search needs
  • the second query result is displayed in front of the user, which can effectively improve the search effect.
  • the text information may be query information
  • a specific implementation manner in which the service device performs information processing based on the text information and the word weight value of each feature word of the text information may also be: searching according to the query information to obtain the second query
  • the word weight values of each feature word of the query information are normalized, based on the normalized word weight values of each feature word
  • the second query results are sorted, and the sorted second query results are output .
  • the normalized feature word 1 feature word 2 and feature word 3
  • the normalized feature word 1 feature word 2 and characteristic word 3
  • the word weight values of word 2 and characteristic word 3 are: 1.2/2.2, 0.8/2.2 and 0.2/2.2, respectively.
  • a specific implementation manner in which the service device performs information processing based on the text information and the word weight value of each feature word of the text information may also be: based on the word weight value of each feature word of the text information, in the text information Among the characteristic words of, the core words and/or invalid words are identified.
  • the service device may use the feature word with the largest weight value as the core word of the text information among the feature words of the foregoing text information, and perform a search based on the core word.
  • the core word is the characteristic word that best represents the real user's intention corresponding to the text information.
  • searching based on the core word can avoid other Feature words have an impact on the query results, and the recalled query results do not meet the real user's intention corresponding to the text information, which is conducive to improving the search effect.
  • the service device may also use the feature word with the largest weight value as the core word of the query information among the feature words of the query information, and obtain synonyms of the core word, Then search based on the core word and synonyms.
  • search based on the core word and synonyms By expanding the synonyms of the core words, and then searching based on the core words and synonyms of the core words, more query results can be recalled, thereby providing users with more choices. For example, when the query information is "what software is good for the US Basketball", and "US Basketball" is the core word of the query information, by expanding the synonyms of the core word, we get “NBA”, based on "US Basketball” and " “NBA” search, you can recall more query results.
  • the service device can obtain synonyms of the core word, and then search again based on the core words and synonyms, and output the query results obtained after the search again.
  • the service device may store a synonym database in advance, and the service device may query the synonym database to obtain the synonym of the core word. If the synonym of the core word does not exist in the synonym database, the service device may request the cloud server to acquire Synonyms of this core word.
  • the service device may determine the feature word whose weight value is less than a preset weight value threshold among the feature words of the query information as an invalid word.
  • the service device can search based on other feature words other than the invalid word in the feature words of the query information, and through other feature words other than the invalid word in the feature word based on the query information Search can reduce the recall of invalid content and improve the accuracy of recalled content.
  • the preset weight value threshold may be set by the service device by default, or may be determined by the service device according to the user's input operation, which is not limited in this embodiment of the present application.
  • the target discriminator is determined in the discriminator included in the text analysis model, and then the feature weight value used by the target discriminator for each feature word in the text information during analysis and recognition is used as text information.
  • the word weight value of the corresponding feature word of can improve the accuracy of the word weight value of the feature word.
  • an information processing result more in line with the user's intention can be obtained.
  • FIG. 4 is a schematic flowchart of another information processing method provided by an embodiment of the present application.
  • the method may be applied to a search system or a question and answer system.
  • the method may include but is not limited to the following steps:
  • Step S401 The service device obtains text information.
  • step S401 for the execution process of step S401, reference may be made to the specific description of step S201 in FIG. 2, and details are not described herein.
  • Step S402 The service device invokes the text analysis model, analyzes and recognizes the text information through each discriminator in the text analysis model, and obtains the target classification category output by the text analysis model, where the text analysis model is a classification model and the text analysis
  • the model includes multiple discriminators, and each discriminator corresponds to a classification category. Each discriminator analyzes and recognizes the recognition result as a probability value.
  • the probability value output by each discriminator can be used to characterize the probability that the text information belongs to the classification category corresponding to the discriminator that outputs the probability value.
  • the service device may determine the maximum probability value among the probability values output by each discriminator in the text analysis model, and use the classification category corresponding to the discriminator that outputs the maximum probability value as the target classification category, where, The target classification category can be used to characterize the real user intent of the text information.
  • the target classification category can be used to characterize the real user intent of the text information.
  • the discriminator analyzes and recognizes the text information through the feature weight value of each feature word in the text information.
  • each feature word in the text information The feature weight values used may be different, or different discriminators may use different feature weight values for some feature words in the text information when analyzing and identifying the same text information, and for another part of the features in the text information.
  • the feature weight values used for words can be the same, therefore, different discriminators analyze and identify the same text information with different probability values.
  • different discriminators in the text analysis model correspond to different classification categories.
  • the text analysis model may be trained based on training sample data.
  • the specific implementation of the service device training to obtain the text analysis model may be: the service device obtains training sample data, the training sample data includes historical text information and annotation information, and based on the historical text information and annotation information, the preset model is performed After training, get the aforementioned text analysis model.
  • the preset model is a model that has not been trained yet.
  • the historical text information may be the query terms (ie, historical query terms) previously entered by the user during query search, and the annotated information may be based on the history The classification category to which the search result obtained by the query search is determined. For example, if the number of search results obtained by searching on the historical query term is 3, and the user selects one of the search results, the marked information is the classification category to which the search result selected by the user belongs.
  • the historical text information may be a question (ie, historical question) that the user previously input when asking, and the annotation information may be obtained by searching for the historical question
  • the answer belongs to the classification category determined. For example, if the number of answers obtained by searching for the historical question is 3, and the user selects one of the answers, the label information is the classification category to which the answer selected by the user belongs.
  • the preset model may set initial feature weight values for each feature word in the text information, and the service device may set the preset model for each feature word in the text information based on historical text information and annotation information. The initial feature weight value is optimized to obtain the aforementioned text analysis model.
  • the historical text information may be historical query information
  • the annotated information may be determined based on user operation data of query results obtained by querying the historical query information
  • the service device may Obtain historical query information and user operation data of query results obtained by querying historical query information, and automatically determine annotation information based on user operation data of query results obtained by querying historical query information, and based on historical query information and annotation information
  • the model be trained to obtain the aforementioned text analysis model, where the historical query information is the real query information input by the user in the past, and the user operation data is the data obtained based on the user's real operation after entering the real query information, that is, the user operation data can be Obtained based on user feedback data, in other words, the text analysis model is based on real user feedback data training.
  • the target classification category obtained by analyzing and identifying the query information by the text analysis model can be more in line with the real user intent corresponding to the query information. Further, based on the target The weight value of each feature word obtained by the classification category can more objectively reflect the user's true search needs.
  • the service device can store all received query information (including historical query information and current query information) in a log file, and accordingly, the service device can query the log file to obtain a large amount of historical query information.
  • the service device searches the historical query information to obtain one or more search results.
  • the user can select the search results he needs from all the obtained search results, and after different users enter the same historical query information, the same search As a result, you can select the same or different search results.
  • the service device may determine the search result selected by the user as the query result obtained by querying the historical query information, and the number of query results obtained by querying the historical query information may be multiple.
  • the user operation data of the query result obtained by querying the historical query information may include: the query result obtained by querying the historical query information, the number of times each query result is selected, and the classification category to which each query result belongs.
  • the number of selections of each query result can be obtained by counting the number of times each query result is clicked, browsed, or performed by the user.
  • the service device may pre-store the classification category to which each query result obtained by querying the historical query information belongs, for example, a video server stores a large amount of video content, where the video server is When each video content is stored, a corresponding classification category is set for each video content, so that subsequent users can search based on the classification category of the video content to obtain video content more in line with user needs.
  • the application download server, e-commerce server or other servers will set and store the classification categories of each content; in addition, in the question and answer dialogue system, historical user questions can also be used as history After querying information and inputting the user into the historical user question, the category of the operation selected by the user belongs to the classification category corresponding to the historical user question. The category of the operation selected by the user can be used to characterize the true intention of the user to enter the historical user question .
  • the two options provided by the question and answer dialogue system after entering the historical user question are camera settings and system settings, respectively, and the user selects After setting the camera, it means that the real intention of the user to input the historical user problem is to set the camera parameters.
  • the preset model may be a multi-classification model, and the preset model may be a multi-classification model of one vs. rest mode.
  • the preset model may be a support vector machine (Support Vector Machine, SVM), linear SVM, Logistic Regression (LR), Gradient Boosting Decision Tree (GBDT), Random Forest (RF) or Sparse Tree (ST) and other models, this application The embodiment does not limit this.
  • the service device trains the preset model based on historical text information and annotation information, and a specific implementation manner to obtain the foregoing text analysis model may be: the service device inputs historical query information as training data to the preset model In the process, the training result is obtained, and the preset model is optimized according to the training result and the labeling information to obtain a text analysis model.
  • the labeling information may be a first classification category determined according to user operation data. Specifically, the service device inputs the historical query information as training data into the preset model, so that the preset model predicts the real user intention corresponding to the historical query information, and uses the predicted prediction category as the training result. If the predicted category and the first category category are inconsistent, it indicates that the predicted category is inaccurate, and the preset model needs to be optimized, so that the optimized preset model predicts the real user intention of historical query information. The first category is consistent.
  • the first classification category may be the classification category to which the query result selected most frequently among the query results obtained by querying the historical query information belongs to, or the sum of the selected times of the query results under the first classification category is the largest.
  • the service device determines the search result selected by the user as the query result obtained by querying the historical query information, the service device can also automatically calculate the sum of the selection times of all query results belonging to the same classification category, and then compare the results of different classification categories The sum of the selection times of all query results, and the maximum number of times is obtained, and then the classification category corresponding to the maximum number of times is determined as the first classification category.
  • the first classification category can be used to characterize the trueness of the aforementioned historical query information. User intent.
  • category 1 can be determined as the first category category and can be considered as a user
  • the real user's intent to enter query information 1 is to obtain content whose classification category is category 1. It can be seen that the service device can automatically determine the first classification category without manually labeling the first classification category, which is beneficial to reduce the cost of model training, and can also avoid the manual classification of the first classification category obtained by the labeler's subjectivity. Accurately reflect the true user's intention of the user to enter the query information.
  • Table 1 The selection times of query results and the classification categories to which they belong
  • the preset model may include multiple initial discriminators, and the foregoing training result may be obtained by analyzing and identifying historical query information by the initial discriminator in the preset model.
  • a specific implementation manner in which the service device performs parameter optimization on the preset model according to the foregoing training result and annotation information may be: use the initial discriminator in the preset model according to the training result and the first classification category The initial feature weight value is optimized.
  • the aforementioned training results may be determined by the preset model according to the recognition results obtained by analyzing and identifying the historical query information by each initial discriminator, and each initial discriminator may be based on the initial features set for each feature word of the historical query information
  • the weight value is used to analyze and identify the historical query information.
  • the identification result obtained by the analysis and identification of each initial discriminator can be a probability value.
  • the probability value can represent the real user's intention corresponding to the historical query information to output the probability value.
  • the probability of the classification category corresponding to the initial discriminator of, the aforementioned training result may be the classification category corresponding to the initial discriminator outputting the maximum probability value.
  • the service device may optimize the initial feature weight values used by some initial discriminators in the preset model. For example, the service device may modify the initial feature weight value set by the target initial discriminator for each feature word of historical query information, where the target initial discriminator corresponds to the predicted category, and based on the modified initial feature weight value, the Let the model analyze and identify the historical query information to obtain the same classification category as the first classification category. In an implementation manner, if the aforementioned prediction category is different from the first classification category, the service device may also optimize the initial feature weight values used by all the initial discriminators in the preset model, which is not made in the embodiments of the present application limited.
  • the service device can automatically obtain training data and labeling information, and automatically complete model training without manually labeling data, which is beneficial to reduce model training costs.
  • the service equipment can automatically optimize the model, thereby effectively improving the prediction accuracy.
  • the service device can obtain test sample data, which is the same as the data included in the aforementioned training sample data, that is, the test sample data includes a large amount of historical query information and each historical query The type of information. It is worth noting that when training the text analysis model, the historical query information in the test sample data is not input into the preset model as training data.
  • the service device can call the text analysis model to analyze and identify each historical query information in the test sample data; the target classification category output by the text analysis model and the corresponding historical query information category in the test sample data For comparison, if the target classification category output by the text analysis model and the corresponding historical query information category in the test sample data are different, the prediction is incorrect.
  • the service device can optimize the parameters of the text analysis model, so that the optimized text analysis model predicts the test sample data
  • the obtained prediction accuracy rate is greater than or equal to the preset accuracy rate threshold.
  • the preset accuracy threshold may be set by the service device by default, or may be determined by the service device according to the user's input operation, which is not limited in this embodiment of the present application.
  • Step S403 Based on the correspondence between the classification category and the discriminator, the service device uses the discriminator corresponding to the target classification category as the target discriminator, and obtains the target discriminator used for each feature word in the text information during analysis and recognition Feature weight value.
  • the target discriminator may be a discriminator that outputs the aforementioned maximum probability value.
  • each discriminator in the text analysis model uses the feature weight values set for each feature word in the text information to analyze and recognize the text information.
  • the feature weight values used by each discriminator are each The discriminator is determined during the training process.
  • different discriminators may use different feature weight values for each feature word in the text information during analysis and recognition.
  • each discriminator included in the text analysis model can be used to identify text information of different classification categories, in other words, when the discriminator in the text analysis model is used to identify text information belonging to the classification category corresponding to the discriminator The probability value output by the discriminator is larger than the probability value obtained by analyzing and identifying the text information by other discriminators in the text analysis model.
  • the text analysis model includes two discriminators (such as the first discriminator and the second discriminator), and the first discriminator is used to recognize the text information of the "video category", the second discriminator is used to identify the "children's category".
  • the second discriminator analyzes and recognizes "family children's song video” and the probability value output will be greater
  • the first discriminator analyzes and recognizes the probability value of "Family Children's Song Video” and outputs it; when the text information is "video ad skip” and the intent category of "video ad skip” is "video category”, the first discriminator
  • the probability value output by the analyzer after analyzing and identifying "video ad skip” will be greater than the probability value output by the second discriminator after analyzing and identifying "video ad skip".
  • the same feature word in the text information of different classification categories has different feature weight values in different discriminators included in the text analysis model, because the discriminator of the text analysis model classifies different categories by feature weight values If the text information of a category is analyzed and identified, the same feature word in the text information of different classification categories has different feature weight values in different discriminators included in the text analysis model, so that different discriminators of the text analysis model are based on different feature weights
  • the value can accurately identify the classification category to which the text information belongs. Specifically, after the service device invokes the text analysis model to analyze and identify the text information, it can obtain the target classification category, that is, the intent category of the text information.
  • the target discrimination determined by the service device determines the word weight value of each feature word in the text information based on the target classification category output by the text analysis model, which can make the word weight value of the same feature word determined in the text information belonging to different intention categories be different.
  • the weight value of the word can obtain search results more in line with the user's search needs.
  • the feature word is "video” or "skin", and the feature word is in a query term that belongs to a different intention category (ie, classification category). Words have different degrees of importance to the content that the user really needs to obtain.
  • the feature weights "video” and “skin” in the query terms belonging to different intent categories are determined as shown in Table 2 and Table 3, respectively.
  • Table 2 The weight value of the word "video" in the query words belonging to different intent categories
  • the 4 query words all include the feature word "video"
  • the intent categories to which the 4 query words belong are different
  • the feature word "video” is obtained when the query words belong to different intent categories. Has different word weight values.
  • the query term is "video ad skip”
  • the word weight value of the feature word "video” is the highest. If the intent category of the query term cannot be identified, the feature word will have the same weight value in the query terms belonging to different intent categories. For example, if the input query term is "family nursery song video”, the obtained word weight value of the feature word "video” is the same as the word weight value of the feature word "video” in the query term "video ad skip".
  • the word weight value of the feature word "video” is much larger than the word weight value of other feature words in the query term "family children's song video", which will result in a large search result when entering the query term "family children's song video”
  • Some of them are video applications, and because there are more video applications and fewer children's songs applications, it is easy to show video applications to the front, so that users can enter query words. "Family children's song video” can not get the children's songs application that they really need. Based on Table 3, conclusions consistent with Table 2 can be drawn.
  • the text analysis model is invoked to analyze and recognize the text information through each discriminator in the text analysis model, and the true intent category of the text information can be obtained, and then the feature words in the text information under the intent category can be obtained
  • the word weight value can improve the accuracy of the word weight value of the feature word. Searching the text information based on the word weight value of the feature word can obtain a search result more in line with the user's real needs.
  • Step S404 The service device determines the feature weight value used by the target discriminator for each feature word in the text information during analysis and recognition as the word weight value of the corresponding feature word in the text information.
  • Each discriminator of the text analysis model analyzes and identifies the text information through the feature weight value, and the text information can be classified into different categories by different feature weight values. Each discriminator distinguishes the different classification categories by the feature weight. Text information.
  • the word weight value can be used to search the text information. You can filter out the search results that do not belong to the target classification category, and then get the search results that belong to the target classification category, that is, the search results that truly meet the user's search needs.
  • the feature weight value used by each discriminator may be determined by each discriminator during the training process.
  • the service device can obtain a large amount of historical query information, and then perform word segmentation processing on each historical query information to obtain the feature words of each historical query information. All the feature words of the historical query information can form a feature word dictionary.
  • the service device can use each feature word in the feature word dictionary as a one-dimensional feature.
  • the service device can obtain the encoding of each feature word in the query information, and then according to the encoding of all feature words in the query information Combine to obtain the feature vector of the query information, and input the feature vector to the preset model for training.
  • the initial discriminator in the preset model sets the initial feature weight value for each dimension feature in the feature word dictionary in advance.
  • the initial discriminator can analyze the query information based on the initial feature weight value set for each feature word in the query information Identify.
  • the service device may also set a unique feature identifier for each word in the feature word dictionary.
  • the service device may obtain each feature word in the historical query information based on the feature word dictionary Feature ID of the feature, and then input the feature ID of each feature word in the historical query information as training data to the preset model for training.
  • the initial discriminator in the preset model sets an initial feature weight value for each feature identifier in the feature word dictionary in advance.
  • the initial discriminator can analyze the query information based on the initial feature weight value set for each feature word in the query information Identify.
  • the service device may determine the word weight value of each feature word in the query information based on the feature weight value used by the target discriminator for each feature word in the query information during analysis and recognition. : The service device determines whether each feature word of the query information exists in the feature word dictionary. If the first feature word of the query information exists in the feature word dictionary, the feature used by the target discriminator for the first feature word when analyzing and identifying is acquired Weight value, and determine the feature weight value as the word weight value of the first feature word; if the second feature word of the query information does not exist in the feature word dictionary, the default value or the inverse text frequency index of the second feature word is obtained, The default value or the inverse text frequency index of the second feature word is determined as the word weight value of the second feature word.
  • the default value may be set by the service device by default, or may be set by the service device according to a preset experience value, which is not limited in this embodiment of the present application.
  • the service device may determine the inverse text frequency index of the second feature word as the word weight value of the second feature word.
  • Step S405 The service device performs information processing based on the text information and the word weight value of each feature word of the text information.
  • the method described in FIG. 4 is applied to the search system, and the text information is used as the query language.
  • the word weight obtained based on the IDF method is used as a comparison.
  • the search effect of the language is tested.
  • the query results obtained by searching the query language based on the word weight value obtained by the method described in FIG. 4 and the word weight value obtained based on the IDF method can be shown in Table 4.
  • the query result obtained by searching the query based on the word weight value obtained by the method described in FIG. 4 It is an application related to the appearance of bears.
  • the query result obtained by searching the query term based on the word weight value obtained by the IDF method is an application related to bumping. Therefore, the method proposed in the embodiment of the present application can effectively improve the accuracy of the query result.
  • Table 4 The query results obtained by searching the query term based on the word weight value obtained by the method described in FIG. 4 and the word weight value obtained based on the IDF method
  • FIG. 5 is a schematic structural diagram of an information processing apparatus provided by an embodiment of the present application.
  • the information processing apparatus 50 is used to execute the steps performed by the service device in the method embodiment corresponding to FIGS.
  • the device 50 may include:
  • the obtaining module 501 is used to obtain text information
  • the analysis module 502 is used to call the text analysis model to analyze and recognize the text information and obtain the output result of the text analysis model;
  • the obtaining module 501 is also used to obtain the feature weight value used by the text analysis model for each feature word in the text information during analysis and recognition according to the output result;
  • the determining module 503 is configured to determine the word weight value of each feature word in the text information based on the acquired feature weight value.
  • the text analysis model may include a discriminator, and the text analysis model analyzes and recognizes the text information through the discriminator; the acquisition module 501 is specifically configured to determine from the discriminator included in the text analysis model according to the output result The target discriminator, and obtains the feature weight value used by the target discriminator for each feature word in the text information during analysis and recognition.
  • the text analysis model may be a classification model, and the text analysis model may include multiple discriminators, each of which may correspond to a classification category, and the acquisition module 501 is used to discriminate from the text analysis model based on the output result.
  • the target discriminator is determined in the detector, it is specifically used to determine the discriminator corresponding to the target classification category included in the output result of the text analysis model as the target discriminator, where the target classification category is based on The recognition result obtained after analyzing the text information is determined.
  • the text analysis model may include multiple discriminators, and the recognition result of each discriminator for analysis and recognition may be a probability value.
  • the foregoing output result may include a target probability value, and the target probability value may be a text analysis model
  • the maximum probability value of the probability values output by each discriminator of the acquisition module 501 is used to determine the target discriminator from the discriminator included in the text analysis model according to the output result, which is specifically used to determine the discriminator that outputs the target probability value Is the target discriminator.
  • the text analysis model may include multiple discriminators, and each discriminator may correspond to an identifier; the acquisition module 501 is used to determine the target discriminator from the discriminators included in the text analysis model according to the output result, Specifically, it is used to determine the discriminator corresponding to the target identifier included in the output result of the text analysis model as the target discriminator, where the target identifier is determined according to the recognition result obtained by analyzing and recognizing the text information according to each discriminator of the text analysis model of.
  • the determining module 503 is used to determine the feature weight value of each feature word in the text information based on the acquired feature weight value, which is specifically used for the feature used for each feature word in the text information The weight value is used as the word weight value of the corresponding characteristic word in the text information.
  • each discriminator included in the text analysis model can be used to identify text information of different classification categories, the same feature word in the text information of different classification categories, and features in different discriminators included in the text analysis model The weight values are different.
  • the analysis module 502 is specifically used to perform word segmentation processing on the text information to obtain each feature word of the text information; and use each feature word of the text information as the input of the text analysis model to obtain the output result of the text analysis model .
  • the information processing device 50 may further include a training module 504 for acquiring training sample data, the training sample data includes historical text information and annotation information; and based on the historical text information and annotation information, the preset model is performed After training, get the aforementioned text analysis model.
  • a training module 504 for acquiring training sample data, the training sample data includes historical text information and annotation information; and based on the historical text information and annotation information, the preset model is performed After training, get the aforementioned text analysis model.
  • FIG. 6 is a schematic structural diagram of a service device provided by an embodiment of the present application.
  • the service device 60 may include a network interface 601, a processor 602, and a memory 603.
  • the network interface 601, the processor 602, and the memory 603 may pass One or more communication buses are connected to each other, and can also be connected by other methods.
  • the related functions implemented by the first processing module 501, the second processing module 502, the third processing module 503, and the fourth processing module 504 shown in FIG. 6 can be implemented by the same processor 602, or by multiple different The processor 602 is implemented.
  • the network interface 601 may be used to send data and/or signaling, and receive data and/or signaling. Application In the embodiment of the present application, the network interface 601 may be used to obtain text information.
  • the processor 602 is configured to perform the corresponding function of the service device in the method described in FIGS. 2-4.
  • the processor 602 may include one or more processors.
  • the processor 602 may be one or more central processing units (CPUs), network processors (NPs), hardware chips, or any of them combination.
  • CPUs central processing units
  • NPs network processors
  • the processor 602 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the memory 603 is used to store program codes and the like.
  • the memory 603 may include volatile memory (volatile memory), such as random access memory (random access memory, RAM); the memory 603 may also include non-volatile memory (non-volatile memory), such as read-only memory (read-memory) only memory (ROM), flash memory (flash memory), hard disk (hard disk drive) or solid state drive (SSD); the memory 603 may also include a combination of the above types of memory.
  • volatile memory volatile memory
  • RAM random access memory
  • non-volatile memory such as read-only memory (read-memory) only memory (ROM), flash memory (flash memory), hard disk (hard disk drive) or solid state drive (SSD)
  • ROM read-only memory
  • flash memory flash memory
  • hard disk drive hard disk drive
  • SSD solid state drive
  • the processor 602 may call the program code stored in the memory 603 to perform the following operations:
  • the word weight values of the feature words in the text information are determined.
  • the text analysis model may include a discriminator, and the text analysis model analyzes and recognizes the text information through the discriminator; the processor 602 executes to obtain the text analysis model according to the output result.
  • the feature weight value of each feature word you can specifically perform the following operations: determine the target discriminator from the discriminator included in the text analysis model according to the output result, and obtain the target discriminator for the text information in the analysis and recognition. The feature weight value used by each feature word.
  • the foregoing text analysis model may be a classification model.
  • the text analysis model may include multiple discriminators, each of which corresponds to a classification category; the processor 602 executes the discriminator included in the text analysis model according to the output When the target discriminator is determined in The recognition result obtained after analyzing the text information by the device is determined.
  • the text analysis model may include multiple discriminators, and the recognition result of each discriminator for analysis and recognition may be a probability value.
  • the foregoing output result may include a target probability value, and the target probability value may be a text analysis model
  • the maximum probability value among the probability values output by each discriminator of the processor when the processor 602 executes to determine the target discriminator from the discriminators included in the text analysis model according to the output result, it can specifically perform the following operations: The device is determined as the target discriminator.
  • the text analysis model may include multiple discriminators, and each discriminator may correspond to an identifier; when the processor 602 executes to determine the target discriminator from the discriminators included in the text analysis model according to the output result, The following operations can be performed: the discriminator corresponding to the target identifier included in the output result of the text analysis model is determined as the target discriminator, where the target identifier is the recognition obtained after analyzing and identifying the text information according to each discriminator of the text analysis model The result is ok.
  • the processor 602 when the processor 602 executes to determine the word weight value of each feature word in the text information based on the acquired feature weight value, it can specifically perform the following operations: it will be used for each feature word in the text information The feature weight value of is used as the word weight value of the corresponding feature word in the text information.
  • each discriminator included in the text analysis model can be used to identify text information of different classification categories, the same feature word in the text information of different classification categories, and features in different discriminators included in the text analysis model
  • the weight value can be different.
  • the processor 602 executes the following operations when determining the word weight value of each feature word in the text information based on the obtained feature weight values, and performs the following operations: performing word segmentation processing on the text information to obtain the text information Each feature word of the text; each feature word of the text information is used as the input of the text analysis model to obtain the output result of the text analysis model.
  • the processor 602 may also perform the following operations: obtain training sample data, which includes historical text information and annotation information; and train the preset model based on the historical text information and annotation information to obtain the foregoing Text analysis model.
  • the processor 602 can also perform the operation corresponding to the service device in the embodiments shown in FIG. 2 to FIG. 4. For details, refer to the description in the method embodiment, and details are not described herein again.
  • An embodiment of the present application also provides a computer-readable storage medium, which can be used to store computer software instructions used by the information processing apparatus in the embodiment shown in FIG. 5, which includes programs for executing the service device designed in the above embodiment .
  • the above-mentioned computer-readable storage medium includes but is not limited to flash memory, hard disk, and solid-state hard disk.
  • An embodiment of the present application also provides a computer program product.
  • the computer device may execute the method designed for the service device in the embodiments shown in FIGS. 2 to 4 above.
  • An embodiment of the present application further provides a chip including a processor and a memory.
  • the memory includes a processor and a memory.
  • the memory is used to store a computer program.
  • the processor is used to call and run the computer program from the memory.
  • the computer program is used to implement the method in the above method embodiment.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from one website site, computer, server, or data center to another website site by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , Computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于机器学习的一种信息处理方法、实现该方法的装置、服务设备和计算机可读存储介质,应用于人工智能的文本智能处理领域。文本信息中的各个词的词权重可以用于评估该词在文本信息中的重要程度,但是采用当前方法计算得到的词的词权重与包含该词的文本信息之间的关联度较低,该词权重并不能准确地反映该词在该文本信息中的重要程度,使得词权重的准确度较低。采用上述方法,通过调用文本分析模型对文本信息进行分析识别,可以基于文本分析模型的输出结果确定文本信息的各个特征词的词权重值,即特征词的词权重值与对文本信息进行分析识别得到的输出结果之间的关联度较高,从而可以有效提高特征词的词权重值的准确度。

Description

信息处理方法、装置、服务设备及计算机可读存储介质 技术领域
本申请涉及机器学习领域,具体涉及一种信息处理方法、装置、服务设备及计算机可读存储介质。
背景技术
文本信息中的各个词的词权重可以用于评估该词在文本信息中的重要程度,应用于搜索系统、问答系统或者其他系统时,通过为文本信息中的词设置恰当的词权重,可以获得更加准确的处理结果。
目前,主要采用词频-逆文本频率指数(Term Frequency–Inverse Document Frequency,TF-IDF)计算词权重,TF-IDF算法的主要思想是:若某个词在一篇文档中出现的频率高,并且在其他文档中很少出现,则认为该词具有很好的类别区分能力,即该词的词权重较高。TF-IDF算法的缺陷在于词的词权重主要由文档集合中包含该词的文档数目所决定,该词的词权重与包含该词的文本信息之间的关联度较低,采用当前做法获得的词权重并不能准确地反映该词在该文本信息中的重要程度,使得词权重的准确度较低。因此,如何提高词权重的准确度成为一个亟待解决的技术问题
发明内容
本申请实施例提供了一种信息处理方法、实现该方法的装置、服务设备及计算机可读存储介质,可以基于对文本信息进行分析识别得到的输出结果,确定文本信息中的特征词的词权重值,使得文本信息的特征词的词权重值与对该文本信息进行分析识别得到的输出结果之间的关联度较高,有利于提高特征词的词权重值的准确度。
第一方面,本申请实施例提供了一种信息处理方法,该方法包括:获取文本信息;调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果;根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值;并基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值。
在该技术方案中,基于对文本信息进行分析识别得到的输出结果,确定文本信息中的各个特征词的词权重值,可以使得文本信息的各个特征词的词权重值与对该文本信息进行分析识别得到的输出结果之间的关联度较高,即使得文本信息的各个特征词的词权重值与文本信息对应的真实用户意图之间的关联度较高,通过这种方式,可以提高特征词的词权重值的准确度。
在一种实现方式中,文本分析模型包括判别器,文本分析模型是通过判别器对文本信息进行分析识别的;根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值的具体实施方式可以为:根据输出结果从文本分析模型包括的判别器中确定出目标判别器,并获取目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值。
在该技术方案中,根据文本分析模型的输出结果从文本分析模型包括的判别器中确定 出目标判别器,而非从文本分析模型中随机确定出目标判别器,可以提高根据目标判别器确定出的特征词的词权重值的准确度。
在一种实现方式中,前述文本分析模型可以为分类模型,文本分析模型可以包括多个判别器,每一个判别器对应一个分类类别;根据输出结果从文本分析模型包括的判别器中确定出目标判别器的具体实施方式可以为:将与文本分析模型的输出结果包括的目标分类类别对应的判别器确定为目标判别器,其中,目标分类类别是根据文本分析模型的各个判别器对文本信息进行分析后得到的识别结果确定的。
在该技术方案中,目标分类类别可以用于表征文本信息的真实用户意图,通过将目标分类类别对应的判别器确定为目标判别器,进而基于目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值,确定文本信息中的各个特征词的词权重值,有利于提高词权重值的准确度。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器进行分析识别的识别结果可以为一个概率值,前述输出结果可以包括目标概率值,目标概率值可以为文本分析模型的各个判别器输出的概率值中的最大概率值;根据输出结果从文本分析模型包括的判别器中确定出目标判别器的具体实施方式可以为:将输出目标概率值的判别器确定为目标判别器。
在该技术方案中,通过将输出最大概率值的判别器确实为目标判别器,可以提高确定出的目标判别器的准确度。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器可以对应一个标识;根据输出结果从文本分析模型包括的判别器中确定出目标判别器的具体实施方式可以为:将与文本分析模型的输出结果包括的目标标识对应的判别器确定为目标判别器,其中,目标标识是根据文本分析模型的各个判别器对文本信息进行分析识别后得到的识别结果确定的。
在一种实现方式中,基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值的具体实施方式可以为:将针对文本信息中的各个特征词所使用的特征权重值作为文本信息中的相应特征词的词权重值。
在该技术方案中,通过将针对文本信息中的各个特征词所使用的特征权重值直接作为文本信息中的相应特征词的词权重值,可以提高确定词权重值的效率。
在一种实现方式中,文本分析模型包括的各个判别器可以用于识别不同分类类别的文本信息,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值可以不同。
在该技术方案中,文本分析模型的判别器是通过特征权重值对不同分类类别的文本信息进行分析识别的,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值不同,使得文本分析模型的不同判别器根据不同的特征权重值可以准确识别出文本信息所属的分类类别。
在一种实现方式中,基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值的具体实施方式可以为:对文本信息进行分词处理,得到该文本信息的各个特征词;将该文本信息的各个特征词作为文本分析模型的输入,得到文本分析模型的输出结果。
在该技术方案中,仅需将文本信息的各个特征词输入文本分析模型,即可得到文本分析模型的输出结果,进而基于输出结果得到文本信息的各个特征词的词权重值,过程简单高效,当文本信息的特征词的数量为多个时,仅需调用一次文本分析模型,即可得到文本信息的各个特征词的词权重值。
在一种实现方式中,该方法还可以包括:获取训练样本数据,训练样本数据包括历史文本信息和标注信息;并基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型。
在一种实现方式中,前述文本信息可以为查询信息,历史文本信息可以为历史查询信息,标注信息可以是根据对历史查询信息查询得到的查询结果的用户操作数据确定的。
在该技术方案中,历史查询信息是用户以往输入的真实查询信息,用户操作数据是根据用户的真实操作得到的数据,即该文本分析模型是基于真实的用户反馈数据训练得到的,当该信息处理方法应用于搜索系统时,可以使得文本分析模型对查询信息进行分析识别得到的输出结果更加符合该查询信息对应的真实用户意图,进一步的,基于该输出结果得到的特征词的词权重值,可以更加客观地反映用户的真实搜索需求。
在一种实现方式中,对历史查询信息查询得到的查询结果的数量可以为多个,用户操作数据可以包括对历史查询信息查询得到的查询结果及每个查询结果的选择次数,以及每个查询结果所属的分类类别;基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型的具体实施方式可以为:将历史查询信息作为训练数据输入到预设模型中,得到训练结果;根据该训练结果和标注信息对预设模型进行参数优化,以得到前述文本分析模型,标注信息可以为根据前述用户操作数据确定的第一分类类别;其中,第一分类类别可以为对历史查询信息查询得到的查询结果中选择次数最多的查询结果所属的分类类别,或者,在第一分类类别下的查询结果被选择的选择次数之和最大。
在该技术方案中,可以自动检测对历史查询信息查询得到的每个查询结果的选择次数,并将查询结果中选择次数最多的查询结果所属的分类类别作为标注信息,即可以自动获取训练数据和标注信息,而无需人工标注,可以有效降低模型的训练成本;另外,服务设备可以自动优化模型,从而有效提高预测准确率。
在一种实现方式中,基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值之后,该方法还可以包括:基于文本信息和该文本信息的各个特征词的词权重值进行信息处理。
在该技术方案中,基于文本信息和该文本信息的各个特征词的词权重值进行信息处理,可以得到更加符合用户意图的信息处理结果。
在一种实现方式中,该文本信息可以为查询信息,基于该文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式可以为:基于该文本信息和该文本信息的各个特征词的词权重值,搜索得到该文本信息的第一查询结果,并输出第一查询结果;或者,基于该文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式可以为:根据文本信息搜索得到第二查询结果,并基于该文本信息的各个特征词的词权重值,对第二查询结果进行排序,输出排序后的第二查询结果。
在该技术方案中,当该信息处理方法应用于搜索系统时,基于特征词的词权重值进行 搜索,可以有效提高搜索召回的第一查询结果的准确率,并且可以使得第一查询结果更加符合用户搜索需求;另外,基于文本信息的各个特征词的词权重值,对第二查询结果进行排序,可以将更符合用户搜索需求的第二查询结果排在前面展示给用户,可以有效提高搜索效果。
在一种实现方式中,基于该文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式可以为:基于该文本信息的各个特征词的词权重值,在该文本信息的特征词中,确定出核心词和/或无效词。
在该技术方案中,当该信息处理方法应用于搜索系统时,核心词是最能代表文本信息对应的真实用户意图的特征词,相较于基于文本信息的所有特征词进行搜索,基于核心词进行搜索可以避免其他特征词对查询结果造成影响,而导致召回的查询结果不符合文本信息对应的真实用户意图,有利于提高搜索效果;另外,确定无效词之后,可以基于文本信息的特征词中除无效词以外的其他特征词进行搜索,通过基于文本信息的特征词中除无效词以外的其他特征词进行搜索,可以减少无效内容的召回,并提高召回内容的准确率。
第二方面,本申请实施例提供了一种信息处理装置,该装置具有实现第一方面所述的方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,本申请实施例提供一种服务设备,该服务设备包括存储器和处理器,存储器中存储有程序指令,处理器通过总线与存储器连接,处理器调用存储器中存储的程序指令以使服务设备执行第一方面所述的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,用于储存为第二方面所述的信息处理装置所用的计算机程序指令,其包含用于执行上述第一方面所涉及的程序。
第五方面,本申请实施例提供一种计算机程序产品,该程序产品包括程序,所述程序被执行时实现上述第一方面所述的方法。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是本申请实施例公开的一种通信系统的架构示意图;
图2是本申请实施例公开的一种信息处理方法的流程示意图;
图3是本申请实施例公开的另一种信息处理方法的流程示意图;
图3a是本申请实施例公开的一种获取目标分类类别的场景示意图;
图3b是本申请实施例公开的一种获取目标概率值的场景示意图;
图3c是本申请实施例公开的一种获取目标标识的场景示意图;
图4是本申请实施例公开的又一种信息处理方法的流程示意图;
图5是本申请实施例公开的一种信息处理装置的结构示意图;
图6是本申请实施例公开的一种服务设备的结构示意图。
具体实施方式
为了更好的理解本申请实施例公开的一种信息处理方法,下面首先对本申请实施例适用的通信系统进行描述。
请参见图1,图1是本申请实施例公开的一种通信系统的架构示意图。如图1所示,该通信系统包括终端设备101、服务设备102和多个数据服务器103。其中,该终端设备101可以是用户设备(user equipment,UE)、远程终端、移动终端、无线通信设备或用户装置等。用户可以通过终端设备101的输入设备在终端设备101显示的搜索框中输入查询语(例如家庭儿歌视频),然后点击搜索按钮,以便终端设备101检测到搜索按钮被点击时,通过网络将查询语发送给服务设备102(步骤S101);服务设备102可以用于对查询语进行分析识别,并基于分析识别的结果获得查询语的各个特征词的词权重值,进而基于各个特征词的词权重值从多个数据服务器103中获取搜索结果(步骤S102);然后将搜索得到的搜索结果发送给终端设备101,以便终端设备101在显示屏中输出搜索结果,以供用户根据自身需要进行选择。在一种实现方式中,服务设备102可以由处理器、存储器和网络接口组成,服务设备102可以是终端设备或者服务器,应用在本申请实施例中,服务设备102可以为搜索引擎服务器。
在一种实现方式中,图1中由服务设备102执行的步骤,可以由终端设备101替代执行,即终端设备101可以对查询语进行分析识别,并基于分析识别的结果获得查询语的各个特征词的词权重值,然后基于各个特征词的词权重值从多个数据服务器103中获取搜索结果。同理,在一种实现方式中,图1中由终端设备101执行的步骤,可以由服务设备102替代执行,即图1中服务设备102接收到的查询语,可以是服务设备102根据用户的输入操作得到的。
可以理解的是,本申请实施例描述的通信系统是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
基于图1所示的通信系统的架构示意图,请参见图2,图2是本申请实施例提供的一种信息处理方法的流程示意图,该方法可以应用于搜索系统或者问答系统,该方法可以包括但不限于如下步骤:
步骤S201:服务设备获取文本信息。其中,文本信息可以是一个词,也可以是由多个词组成的句子。在一种实现方式中,当图2所示方法应用于搜索系统时,该文本信息可以是用户在查询搜索时输入的查询语,该查询语可以是以文本方式输入的,也可以是以语音方式输入的,当查询语以语音方式输入时,需要将语音格式的查询语转换为文本格式。在一种实现方式中,当图2所示方法应用于问答系统时,该文本信息可以是用户在询问时输入的问题,该问题可以是以文本方式输入的,也可以是以语音方式输入的,当问题以语音方式输入时,需要将语音格式的问题转换为文本格式。本申请实施例以文本信息为查询语为例进行说明。在一种实现方式中,文本信息可以是用户在终端设备中输入,并由该终端设备发送给服务设备的,或者,该文本信息也可以是用户在服务设备中输入的,本申请实施例对此不作限定。在一种实现方式中,服务设备可以是终端设备或者服务器。
步骤S202:服务设备调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果。其中,该文本分析模型可以是机器学习模型中的分类模型或者回归模型。 当该文本分析模型为分类模型时,该文本分析模型可以对应一个或多个分类类别。当该文本分析模型对应一个分类类别时,该文本分析模型可以用于识别该文本信息的真实用户意图是否属于该文本分析模型对应的分类类别,并且该文本分析模型的输出结果可以用于指示是否根据文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值,确定文本信息的各个特征词的词权重值。在一种实现方式中,文本分析模型可以是基于大量真实的历史文本信息,以及输入历史文本信息的实际使用者反馈的分类类别训练得到的,因此,该文本分析模型可以用于识别该文本信息的真实用户意图。例如,当历史文本信息为用户在查询搜索时输入的查询语时,针对该查询语进行搜索可以得到查询结果,该用户作为实际使用者反馈的分类类别可以是该用户选择的查询结果所属的类别。若文本分析模型识别出该文本信息的真实用户意图属于该文本分析模型对应的分类类别,则文本分析模型的输出结果用于指示根据文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值,确定文本信息的各个特征词的词权重值。例如,当该文本分析模型对应的分类类别为“儿歌类”,且文本信息为查询语“家庭儿歌视频”时,文本分析模型识别出“家庭儿歌视频”的真实用户意图属于“儿歌类”,此时,该文本分析模型的输出结果用于指示根据文本分析模型在分析识别时针对“家庭儿歌视频”中的各个特征词所使用的特征权重值确定“家庭儿歌视频”的各个特征词的词权重值。通过这种方式,可以使得文本信息的各个特征词的词权重值与对该文本信息进行分析识别得到的输出结果之间的关联度较高,即使得文本信息的各个特征词的词权重值与文本信息对应的真实用户意图之间的关联度较高,通过这种方式,可以提高特征词的词权重值的准确度。
在一种实现方式中,文本分析模型可以对应多个分类类别,并且文本分析模型可以包括多个判别器,其中,每一个判别器可以对应一个分类类别,文本分析模型可以通过判别器对文本信息进行分析识别,并且每一个判别器可以通过不同的特征权重值对文本信息进行分析识别,此时,该文本分析模型可以用于识别文本信息的真实用户意图属于该文本分析模型包括的哪一个判别器对应的分类类别。若文本信息的真实用户意图属于判别器1对应的分类类别,则文本分析模型的输出结果可以用于指示根据判别器1在分析识别时针对文本信息中的各个特征词所使用的特征权重值,确定文本信息的各个特征词的词权重值,其中,判别器1为文本分析模型包括的其中一个判别器。
在一种实现方式中,当该文本分析模型为回归模型时,该文本分析模型可以对应一个分类类别,该文本分析模型可以用于分析出该文本信息的真实用户意图属于该文本分析模型对应的分类类别的概率,即该文本分析模型的输出结果可以是一个概率值,当该概率值大于第一预设概率值阈值时,服务设备可以获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值,并基于获取的各个的特征权重值确定该文本信息中的各个特征词的词权重值。其中,第一预设概率值阈值可以是服务设备默认设置的,也可以是服务设备根据用户的输入操作确定的,本申请实施例对此不作限定。
在一种实现方式中,服务设备调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果的具体实施方式可以为:服务设备对文本信息进行分词处理,得到该文本信息的各个特征词,并将该文本信息的各个特征词作为文本分析模型的输入,得到该文本分析模型的输出结果。通过这种方式,仅需将文本信息的各个特征词输入文本分析 模型,即可得到文本分析模型的输出结果,进而基于输出结果得到文本信息的各个特征词的词权重值,过程简单高效,当文本信息的特征词的数量为多个时,仅需调用一次文本分析模型,即可得到文本信息的各个特征词的词权重值。在一种实现方式中,服务设备对文本信息进行分词处理,得到该文本信息的各个特征词的具体实施方式可以为:服务设备调用分词算法对文本信息进行分词处理,得到该文本信息的各个分词,并将得到的各个分词确定为该文本信息的各个特征词。在一种实现方式中,分词算法可以包括但不限于基于字符串匹配的分词算法(如正向最大匹配法、逆向最大匹配法、最少切分、双向最大匹配法等)、基于理解的分词算法和基于统计的分词算法,本申请实施例对此不作限定。
步骤S203:服务设备根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值。文本分析模型是使用文本信息中的各个特征词的特征权重值对文本信息进行分析识别的,若文本分析模型的输出结果指示根据文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值确定文本信息的各个特征词的词权重值,则服务设备可以获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值,并基于获取的各个特征权重值确定该文本信息中的各个特征词的词权重值。其中,文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值可以是在训练过程中确定的,也可以是根据经验值设置的,本申请实施例对此不作限定。
步骤S204:服务设备基于获取的各个特征权重值,确定该文本信息中的各个特征词的词权重值。具体的,服务设备可以将文本分析模型针对文本信息中的各个特征词所使用的特征权重值作为该文本信息中的相应特征词的词权重值。例如,当文本信息为“家庭儿歌视频”,且“家庭儿歌视频”中的各个特征词分别为“家庭”、“儿歌”和“视频”时,服务设备可以将文本分析模型在分析识别时针对特征词“家庭”、“儿歌”和“视频”所使用的特征权重值分别作为“家庭”、“儿歌”和“视频”的词权重值。机器学习模型的传统使用方法是将作为机器学习模型的文本分析模型所使用的特征权重值作为分析识别过程中的参数,然后将文本分析模型的输出结果作为最终结果,然而本申请实施例直接将文本分析模型的参数(即特征权重值)作为文本信息的特征词的词权重值,与机器学习模型的传统使用方法有本质区别。
在一种实现方式中,文本信息中的特征词的个数可以为一个或多个,每个特征词在文本分析模型中均对应有一个特征权重值,服务设备基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值的具体实施方式可以为:服务设备对获取的各个特征权重值进行归一化处理,并将各个归一化处理后的特征权重值作为相应特征词的词权重值。
可见,通过实施本申请实施例,基于对文本信息进行分析识别得到的输出结果,确定文本信息中的各个特征词的词权重值,可以使得文本信息的各个特征词的词权重值与对该文本信息进行分析识别得到的输出结果之间的关联度较高,即使得文本信息的各个特征词的词权重值与文本信息对应的真实用户意图之间的关联度较高,有利于提高特征词的词权重值的准确度。
请参见图3,图3是本申请实施例提供的另一种信息处理方法的流程示意图,该方法可 以应用于搜索系统或者问答系统,该方法可以包括但不限于如下步骤:
步骤S301:服务设备获取文本信息。需要说明的是,步骤S301的执行过程可参见图2中步骤S201的具体描述,在此不赘述。
步骤S302:服务设备调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果,该文本分析模型包括判别器,该文本分析模型是通过判别器对文本信息进行分析识别的。具体的,文本分析模型可以包括一个或多个判别器,当文本分析模型包括一个判别器时,文本分析模型的输出结果可以用于指示是否将文本分析模型中的判别器确定为目标判别器;当文本分析模型包括多个判别器时,根据文本分析模型的输出结果,可以从文本分析模型中的多个判别器中确定出目标判别器。在一种实现方式中,文本分析模型可以为分类模型,文本分析模型中的每一个判别器可以对应一个分类类别,文本分析模型中的每一个判别器对文本信息进行分析识别的识别结果可以为一个概率值,该概率值可以用于表征该文本信息属于输出该概率值的判别器对应的分类类别的概率。
在一种实现方式中,当文本分析模型包括一个判别器时,文本分析模型的输出结果可以是文本分析模型中的判别器对文本信息进行分析识别得到的概率值,若该概率值大于第二预设概率值阈值,则服务设备可以将文本分析模型中的判别器确定为目标判别器。在一种实现方式中,第二预设概率值阈值可以是服务设备默认设置的,也可以是服务设备根据用户的输入操作确定的,本申请实施例对此不作限定。
在一种实现方式中,当文本分析模型包括多个判别器,且每一个判别器对应一个分类类别时,文本分析模型的输出结果可以包括目标分类类别,进一步的,服务设备可以基于分类类别与判别器之间的对应关系,将与目标分类类别对应的判别器确定为目标判别器。其中,文本分析模型中的不同判别器对应的分类类别不同,该目标分类类别可以是根据文本分析模型的各个判别器对文本信息进行分析后得到的识别结果确定的,且各个判别器对文本信息进行分析后得到的识别结果可以为一个概率值。以图3a所示的一种获取目标分类类别的场景示意图为例,当文本分析模型包括3个判别器(判别器1、判别器2和判别器3),判别器1、判别器2和判别器3分别与工具类、学习类和儿歌类对应,且3个判别器对文本信息“家庭儿歌视频”进行分析识别得到的概率值分别为0.1、0.2和0.95时,文本分析模型的输出结果可以包括与输出最大概率值0.95的判别器对应的类别,即目标分类类别可以为儿歌类,进一步的,服务设备可以将儿歌类对应的判别器3作为目标判别器。其中,目标分类类别可以用于表征文本信息的真实用户意图,通过将目标分类类别对应的判别器确定为目标判别器,进而基于目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值,确定文本信息中的各个特征词的词权重值,有利于提高词权重值的准确度。
在一种实现方式中,当文本分析模型包括多个判别器时,文本分析模型的输出结果可以包括目标概率值,进一步的,服务设备可以将输出该目标概率值的判别器确定为目标判别器。其中,文本分析模型中的不同判别器对文本信息进行分析识别得到的概率值不同,目标概率值可以为各个判别器输出的概率值中的最大概率值。以图3b所示的一种获取目标概率值的场景示意图为例,当文本分析模型包括3个判别器(判别器1、判别器2和判别器3),且3个判别器对文本信息“家庭儿歌视频”进行分析识别得到的概率值分别为0.1、0.2和0.95时,文本分析模型的输出结果可以包括3个概率值中的最大概率值,即目标概率值可以为 0.95,进一步的,服务设备可以将输出目标概率值0.95的判别器3作为目标判别器。通过将输出最大概率值的判别器确实为目标判别器,可以提高确定出的目标判别器的准确度。
在一种实现方式中,当文本分析模型包括多个判别器时,文本分析模型的输出结果可以包括目标标识,进一步的,服务设备可以将目标标识对应的判别器确定为目标判别器。其中,目标标识用于唯一标识一个判别器,该目标标识可以是根据文本分析模型的各个判别器对文本信息进行分析后得到的识别结果确定的,且各个判别器对文本信息进行分析后得到的识别结果可以为一个概率值。以图3c所示的一种获取目标标识的场景示意图为例,当文本分析模型包括3个判别器(判别器1、判别器2和判别器3),3个判别器的标识分别为标识1、标识2和标识3,且3个判别器对文本信息“家庭儿歌视频”进行分析识别得到的概率值分别为0.1、0.2和0.95时,文本分析模型的输出结果可以包括输出最大概率0.95的判别器的标识,即目标标识可以为标识3,进一步的,服务设备可以将标识3对应的判别器3作为目标判别器。
步骤S303:服务设备根据该输出结果从文本分析模型包括的判别器中确定出目标判别器,并获取目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值。具体的,若输出结果包括目标分类类别,则服务设备可以基于分类类别与判别器之间的对应关系,将与目标分类类别对应的判别器确定为目标判别器;若输出结果包括目标概率值,则服务设备可以根据各个判别器输出的概率值,将输出该目标概率值的判别器确定为目标判别器;若输出结果包括目标标识,则服务设备可以基于标识与判别器之间的对应关系,将目标标识对应的判别器确定为目标判别器。
在本申请实施例中,目标判别器是基于针对文本信息中的各个特征词所使用的特征权重值对该文本信息进行分析识别的,针对文本信息中的各个特征词所使用的特征权重值可以是目标判别器在训练过程中确定的。在一种实现方式中,当文本分析模型包括多个判别器时,不同判别器对文本信息进行分析识别时针对文本信息中的各个特征词所使用的特征权重值均可以不同,或者,不同判别器在对文本信息进行分析识别时针对文本信息中的部分特征词所使用的特征权重值可以不同,并且针对文本信息中的另一部分特征词所使用的特征权重值可以相同,本申请实施例对此不作限定。本申请实施例根据文本分析模型的输出结果从文本分析模型包括的判别器中确定出目标判别器,而非从文本分析模型中随机确定出目标判别器,有利于提高根据目标判别器确定出的特征词的词权重值的准确度。
步骤S304:服务设备将目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值作为文本信息中的相应特征词的词权重值。在一种实现方式中,服务设备可以将各个判别器在对文本信息进行分析识别时针对文本信息中的各个特征词所使用的特征权重值记录于数据库中,以便在确定出目标判别器之后,服务设备可以从数据库中提取出目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值,进而将提取出的各个特征权重值直接作为相应特征词的词权重值。通过将针对文本信息中的各个特征词所使用的特征权重值直接作为文本信息中的相应特征词的词权重值,可以提高确定词权重值的效率。
步骤S305:服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理。基于文本信息和该文本信息的各个特征词的词权重值进行信息处理,可以得到更加符 合用户意图的信息处理结果。
在一种实现方式中,当图3所述方法应用于搜索系统时,该文本信息可以为查询语,服务设备根据查询语中的各个特征词的词权重值对该查询语进行搜索,可以使得搜索召回的结果更加符合用户搜索需求。在一种实现方式中,当图3所述方法应用于问答系统时,该文本信息可以为问题,服务设备根据问题中的各个特征词的词权重值对该问题进行搜索,可以得到更加符合用户意图的答案。
在一种实现方式中,文本信息可以为查询信息,服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式可以为:服务设备基于查询信息和该查询信息的各个特征词的词权重值,搜索得到该查询信息的第一查询结果,并输出第一查询结果。其中,第一查询结果可以是服务设备根据查询信息的各个特征词的词权重值对查询信息的特征词进行加权处理后搜索得到的,通过这种方式,当图3所述方法应用于搜索系统时,可以有效提高搜索召回的第一查询结果的准确率,并且可以使得第一查询结果更加符合用户搜索需求。
在一种实现方式中,文本信息可以为查询信息,服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式还可以为:根据查询信息搜索得到第二查询结果,并基于查询信息的各个特征词的词权重值,对第二查询结果进行排序,输出排序后的第二查询结果。其中,第二查询结果可以是基于查询信息的各个特征词的词权重值进行搜索得到的,或者,若服务设备在根据查询信息进行搜索时,无法通过文本分析模型获取查询信息的各个特征词的词权重值,则第二查询结果可以是基于为查询信息中的各个特征词设置的默认权重值进行搜索得到的。在一种实现方式中,默认权重值可以是服务设备根据预先设定的经验值设置的,也可以是根据TF-IDF算法计算得到的词权重值,本申请实施例对此不作限定。服务设备在获取第二查询结果之后,基于查询信息的各个特征词的词权重值,对第二查询结果进行排序,可以优先输出更符合用户搜索需求的第二查询结果,即将更符合用户搜索需求的第二查询结果排在前面展示给用户,可以有效提高搜索效果。
在一种实现方式中,文本信息可以为查询信息,服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式还可以为:根据查询信息搜索得到第二查询结果,对查询信息的各个特征词的词权重值进行归一化处理,基于归一化之后的各个特征词的词权重值,对第二查询结果进行排序,并输出排序后的第二查询结果。例如,若查询信息包括的3个特征词分别为特征词1、特征词2和特征词3,且各自的词权重值分别为1.2、0.8和0.2时,归一化之后的特征词1、特征词2和特征词3的词权重值分别为:1.2/2.2、0.8/2.2和0.2/2.2。
在一种实现方式中,服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理的具体实施方式还可以为:基于文本信息的各个特征词的词权重值,在文本信息的特征词中,确定出核心词和/或无效词。具体的,当文本信息为查询信息时,服务设备可以在前述文本信息的特征词中,将权重值最大的特征词作为该文本信息的核心词,并基于核心词进行搜索。当图3所述方法应用于搜索系统时,核心词是最能代表文本信息对应的真实用户意图的特征词,相较于基于文本信息的所有特征词进行搜索,基于核心词进行搜 索可以避免其他特征词对查询结果造成影响,而导致召回的查询结果不符合文本信息对应的真实用户意图,有利于提高搜索效果。
在一种实现方式中,当文本信息为查询信息时,服务设备还可以在查询信息的特征词中,将权重值最大的特征词作为该查询信息的核心词,并获取该核心词的同义词,然后基于该核心词和同义词进行搜索。通过对核心词进行同义词扩展,然后基于核心词和核心词的同义词进行搜索,可以召回更多的查询结果,进而为用户提供更多的选择。例如,查询信息为“什么软件看美职篮好”,且“美职篮”为该查询信息的核心词时,通过扩展核心词的同义词,得到“NBA”,基于“美职篮”和“NBA”进行搜索,可以召回更多查询结果。在一种实现方式中,若基于核心词搜索召回的查询结果较少,则服务设备可以获取核心词的同义词,然后基于核心词和同义词再次进行搜索,并输出再次搜索后得到的查询结果。在一种实现方式中,服务设备可以预先存储有同义词数据库,服务设备可以通过查询同义词数据库获取核心词的同义词,若同义词数据库中不存在该核心词的同义词,则服务设备可以向云服务器请求获取该核心词的同义词。
在一种实现方式中,当文本信息为查询信息时,服务设备可以在查询信息的特征词中,将权重值小于预设权重值阈值的特征词确定为无效词。在一种实现方式中,服务设备确定无效词之后,可以基于查询信息的特征词中除无效词以外的其他特征词进行搜索,通过基于查询信息的特征词中除无效词以外的其他特征词进行搜索,可以减少无效内容的召回,并提高召回内容的准确率。其中,预设权重值阈值可以是服务设备默认设置的,也可以是服务设备根据用户的输入操作确定的,本申请实施例对此不作限定。
可见,通过实施本申请实施例,在文本分析模型包括的判别器中确定出目标判别器,进而将目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值作为文本信息的相应特征词的词权重值,可以提高特征词的词权重值的准确度。另外,基于文本信息和该文本信息的各个特征词的词权重值进行信息处理,可以得到更加符合用户意图的信息处理结果。
请参见图4,图4是本申请实施例提供的又一种信息处理方法的流程示意图,该方法可以应用于搜索系统或者问答系统,该方法可以包括但不限于如下步骤:
步骤S401:服务设备获取文本信息。
需要说明的是,步骤S401的执行过程可参见图2中步骤S201的具体描述,在此不赘述。
步骤S402:服务设备调用文本分析模型,通过文本分析模型中的各个判别器对文本信息进行分析识别,并获取文本分析模型输出的目标分类类别,其中,该文本分析模型为分类模型,该文本分析模型包括多个判别器,每一个判别器对应一个分类类别。每一个判别器进行分析识别的识别结果为一个概率值,每一个判别器输出的概率值可以用于表征该文本信息属于输出该概率值的判别器对应的分类类别的概率。在一种实现方式中,服务设备可以确定文本分析模型中的各个判别器输出的概率值中的最大概率值,并将输出该最大概率值的判别器对应的分类类别作为目标分类类别,其中,目标分类类别可以用于表征文本信息的真实用户意图。通过将输出最大概率值的判别器对应的分类类别作为文本信息的真实用户意图,可以提高确定出的真实用户意图的准确度。
在一种实现方式中,判别器是通过文本信息中的各个特征词的特征权重值对该文本信息进行分析识别的,不同判别器对同一文本信息进行分析识别时针对文本信息中的各个特征词所使用的特征权重值均可以不同,或者,不同判别器在对同一文本信息进行分析识别时针对文本信息中的部分特征词所使用的特征权重值可以不同,并且针对文本信息中的另一部分特征词所使用的特征权重值可以相同,因此,不同判别器对同一文本信息进行分析识别得到的概率值不同。在一种实现方式中,文本分析模型中的不同判别器对应的分类类别不同。
在一种实现方式中,文本分析模型可以是基于训练样本数据训练得到的。具体的,服务设备训练得到文本分析模型的具体实施方式可以为:服务设备获取训练样本数据,该训练样本数据包括历史文本信息和标注信息,并基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型。
其中,预设模型是还未经过训练的模型。在一种实现方式中,当图4所示方法应用于搜索系统时,该历史文本信息可以是用户以往在查询搜索时输入的查询语(即历史查询语),标注信息可以是根据对该历史查询语搜索得到的搜索结果所属的分类类别确定的。例如,对该历史查询语搜索得到的搜索结果的数量为3个,并且用户选择了其中的1个搜索结果,则标注信息为用户选择的搜索结果所属的分类类别。在一种实现方式中,当图4所示方法应用于问答系统时,该历史文本信息可以是用户以往在询问时输入的问题(即历史问题),标注信息可以是根据对该历史问题搜索得到的答案所属的分类类别确定的。例如,对该历史问题搜索得到的答案的数量为3个,并且用户选择了其中的1个答案,则标注信息为用户选择的答案所属的分类类别。在一种实现方式中,预设模型可以为文本信息中的各个特征词设置初始特征权重值,服务设备可以基于历史文本信息和标注信息,对预设模型为文本信息中的各个特征词设置的初始特征权重值进行优化,以得到前述文本分析模型。
在一种实现方式中,当文本信息为查询信息时,前述历史文本信息可以为历史查询信息,前述标注信息可以是根据对历史查询信息查询得到的查询结果的用户操作数据确定的,服务设备可以获取历史查询信息和对历史查询信息查询得到的查询结果的用户操作数据,并根据对历史查询信息查询得到的查询结果的用户操作数据自动确定标注信息,并基于历史查询信息和标注信息,对预设模型进行训练,得到前述文本分析模型,其中,历史查询信息是用户以往输入的真实查询信息,用户操作数据是输入真实查询信息之后根据用户的真实操作得到的数据,即该用户操作数据可以是根据用户反馈的数据得到的,换言之,该文本分析模型是基于真实的用户反馈数据训练得到的。通过这种方式,当图4所示方法应用于搜索系统时,可以使得文本分析模型对查询信息进行分析识别得到的目标分类类别更加符合该查询信息对应的真实用户意图,进一步的,基于该目标分类类别得到的各个特征词的词权重值,可以更加客观地反映用户的真实搜索需求。
在一种实现方式中,服务设备可以将接收到的所有查询信息(包括历史查询信息和当前查询信息)存储至日志文件,相应的,服务设备可以通过查询日志文件,得到大量历史查询信息。服务设备对历史查询信息进行搜索,可以得到一个或多个搜索结果,用户可以在得到的所有搜索结果中选择自身需要的搜索结果,并且不同用户在输入同一历史查询信息之后,针对得到的相同搜索结果,可以从中选择相同或者不同的搜索结果。在一种实现 方式中,服务设备可以将被用户选择的搜索结果确定为对该历史查询信息查询得到的查询结果,并且对该历史查询信息查询得到的查询结果的数量可以为多个。
具体的,对该历史查询信息查询得到的查询结果的用户操作数据可以包括:对该历史查询信息查询得到的查询结果及每个查询结果的选择次数、以及每个查询结果所属的分类类别。其中,每个查询结果的选择次数可以是通过统计每个查询结果被用户点击浏览、下载或者进行其他操作的次数得到的。在一种实现方式中,服务设备中可以预先存储有对该历史查询信息查询得到的每个查询结果所属的分类类别,例如,在某视频服务器中存储了大量的视频内容,其中,视频服务器在存储每个视频内容时,会为每个视频内容设置相应的分类类别,以便后续用户搜索时,可以基于视频内容的分类类别搜索得到更加符合用户需求的视频内容。需要说明的是,上述举例并非穷举,在应用下载服务器、电商服务器或者其他服务器中均会设置并存储各个内容的分类类别;另外,在问答对话系统中,也可以将历史用户问题作为历史查询信息,并将用户输入该历史用户问题之后,用户选择的操作所属的类别作为该历史用户问题对应的分类类别,用户选择的操作所属的类别可以用于表征用户输入该历史用户问题的真正意图。例如,在问答对话系统中,若用户输入的历史用户问题为“如何关闭相机的闪光灯”,并且输入该历史用户问题之后问答对话系统提供的两个选项分别为相机设置和系统设置,且用户选择了相机设置,即表明用户输入该历史用户问题的真正意图在于进行相机参数设置。
在一种实现方式中,预设模型可以是多分类模型,并且该预设模型可以是one vs rest模式的多分类模型,具体实现中,该预设模型可以为支持向量机(Support Vector Machine,SVM)、线性SVM、逻辑回归(Logistic Regression,LR)、梯度提升决策树(Gradient Boosting Decision Tree,GBDT)、随机森林(Random Forest,RF)或稀疏树(Sparse Tree,ST)等模型,本申请实施例对此不作限定。
在一种实现方式中,服务设备基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型的具体实施方式可以为:服务设备将历史查询信息作为训练数据输入到预设模型中,得到训练结果,并根据该训练结果和标注信息对预设模型进行参数优化,以得到文本分析模型,其中,标注信息可以为根据用户操作数据确定的第一分类类别。具体的,服务设备将历史查询信息作为训练数据输入到预设模型中,以便预设模型对历史查询信息对应的真实用户意图进行预测,并将预测得到的预测类别作为训练结果。若预测类别和第一分类类别不一致,则表明预测类别不准确,需要对预设模型进行参数优化,使得优化后的预设模型对历史查询信息的真实用户意图进行预测时,得到的预测类别与第一分类类别一致。
其中,第一分类类别可以为对该历史查询信息查询得到的查询结果中选择次数最多的查询结果所属的分类类别,或者,在第一分类类别下的查询结果被选择的选择次数之和最大。服务设备将被用户选择的搜索结果确定为对该历史查询信息查询得到的查询结果时,服务设备还可以自动计算属于同一分类类别的所有查询结果的选择次数之和,进而比较属于不同分类类别的所有查询结果的选择次数之和,并得到次数和的最大值,然后将次数和的最大值对应的分类类别确定为第一分类类别,第一分类类别可以用于表征前述历史查询信息对应的真实用户意图。例如,若用户输入查询信息1,并对查询信息1进行搜索得到了 10个搜索结果,其中,仅有3个搜索结果被用户选择,即该3个搜索结果为对查询信息1查询得到的查询结果,该3个查询结果被选择的选择次数及其所属的分类类别如表1所示。由表1可知,在类别1下,被选择的所有查询结果为查询结果1和查询结果3,在类别2下,被选择的所有查询结果为查询结果2。相较于类别2,在类别1下的查询结果(即查询结果1和查询结果3)被选择的选择次数之和最大,此时,可以将类别1确定为第一分类类别,并可以认为用户输入查询信息1的真实用户意图在于获取分类类别为类别1的内容。由此可知,服务设备可以自动确定第一分类类别,而无需人工标注第一分类类别,有利于降低模型训练成本,并且还可以避免因标注者的主观性使得人工标注得到的第一分类类别不能准确反映用户输入查询信息的真实用户意图。
表1查询结果的选择次数及其所属的分类类别
查询结果标识 选择次数(次) 所属的分类类别
查询结果1 100 类别1
查询结果2 30 类别2
查询结果3 8 类别1
在一种实现方式中,预设模型可以包括多个初始判别器,前述训练结果可以是预设模型中的初始判别器对历史查询信息进行分析识别得到的。在一种实现方式中,服务设备根据前述训练结果和标注信息对预设模型进行参数优化的具体实施方式可以为:根据该训练结果和第一分类类别对预设模型中的初始判别器所使用的初始特征权重值进行优化。
具体的,前述训练结果可以是预设模型根据各个初始判别器对历史查询信息进行分析识别得到的识别结果确定的,每个初始判别器可以是根据为历史查询信息的各个特征词设置的初始特征权重值,对该历史查询信息进行分析识别的,每个初始判别器进行分析识别得到的识别结果可以为一个概率值,该概率值可以表征该历史查询信息对应的真实用户意图为输出该概率值的初始判别器对应的分类类别的概率,前述训练结果可以为输出最大概率值的初始判别器对应的分类类别。
在一种实现方式中,若前述训练结果(即预测类别)与第一分类类别不同,则服务设备可以对预设模型中的部分初始判别器所使用的初始特征权重值进行优化。例如,服务设备可以对目标初始判别器为历史查询信息的各个特征词设置的初始特征权重值进行修改,其中,目标初始判别器与预测类别相对应,并且基于修改后的初始特征权重值,预设模型对历史查询信息进行分析识别得到的分类类别与第一分类类别相同。在一种实现方式中,若前述预测类别与第一分类类别不同,则服务设备也可以对预设模型中的全部初始判别器所使用的初始特征权重值进行优化,本申请实施例对此不作限定。
在本申请实施例中,服务设备可以自动获取训练数据和标注信息,并自动完成模型训练,而不用人工标注数据,有利于降低模型训练成本。另外,服务设备可以自动优化模型,从而有效提高预测准确率。
在一种实现方式中,在训练得到文本分析模型之后,服务设备可以获取测试样本数据,测试样本数据与前述训练样本数据包括的数据相同,即测试样本数据包括大量历史查询信息和每个历史查询信息的类别,值得注意的是,在训练文本分析模型时,测试样本数据中的历史查询信息并未作为训练数据输入至预设模型中。服务设备获取测试样本数据之后, 可以调用文本分析模型对测试样本数据中的每个历史查询信息进行分析识别;并将文本分析模型输出的目标分类类别和测试样本数据中对应的历史查询信息的类别进行比较,若文本分析模型输出的目标分类类别和测试样本数据中对应的历史查询信息的类别不同,则预测错误,若文本分析模型输出的目标分类类别和测试样本数据中对应的历史查询信息的类别相同,则预测正确,从而统计得到预测准确率;若预测准确率小于预设准确率阈值,则服务设备可以对文本分析模型进行参数优化,使得优化后的文本分析模型对测试样本数据进行预测得到的预测准确率大于或等于预设准确率阈值。通过基于预测准确率进行模型调优,可以确保文本分析模型的准确率,进一步的,有利于提高特征词的词权重值的准确度。在一种实现方式中,预设准确率阈值可以是服务设备默认设置的,也可以是服务设备根据用户的输入操作确定的,本申请实施例对此不作限定。
步骤S403:服务设备基于分类类别与判别器之间的对应关系,将目标分类类别对应的判别器作为目标判别器,并获取目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值。其中,目标判别器可以为输出前述最大概率值的判别器。
在一种实现方式中,文本分析模型中的各个判别器均是使用为文本信息中的各个特征词设置的特征权重值,对文本信息进行分析识别的,各个判别器使用的特征权重值是各个判别器在训练过程中确定的。在一种实现方式中,不同判别器在分析识别时对文本信息中的各个特征词使用的特征权重值均可以不同。在一种实现方式中,文本分析模型包括的各个判别器可以用于识别不同分类类别的文本信息,换言之,当文本分析模型中的判别器用于识别属于该判别器对应的分类类别的文本信息时,该判别器输出的概率值相较文本分析模型中的其他判别器对该文本信息进行分析识别得到的概率值要大。例如,当文本分析模型包括两个判别器(如第一判别器和第二判别器),且第一判别器用于识别“视频类”的文本信息,第二判别器用于识别“儿歌类”的文本信息时,当文本信息为“家庭儿歌视频”,且“家庭儿歌视频”的意图类别为“儿歌类”时,第二判别器对“家庭儿歌视频”进行分析识别后输出的概率值将大于第一判别器对“家庭儿歌视频”进行分析识别后输出的概率值;当文本信息为“视频广告跳过”,且“视频广告跳过”的意图类别为“视频类”时,第一判别器对“视频广告跳过”进行分析识别后输出的概率值将大于第二判别器对“视频广告跳过”进行分析识别后输出的概率值。
在一种实现方式中,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值不同,由于文本分析模型的判别器是通过特征权重值对不同分类类别的文本信息进行分析识别的,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值不同,使得文本分析模型的不同判别器根据不同的特征权重值可以准确识别出文本信息所属的分类类别。具体的,服务设备调用文本分析模型对文本信息进行分析识别后,可以得到目标分类类别,也即该文本信息的意图类别,若文本分析模型输出的目标分类类别不同,则服务设备确定的目标判别器不同,由于各个判别器在对文本信息进行分析识别时针对文本信息中的各个特征词所使用的特征权重值均不相同。因此,服务设备基于文本分析模型输出的目标分类类别确定文本信息中的各个特征词的词权重值,可以使得同一特征词在属于不同意图类别的文本信息中时确定出的词权重值不同,采用该词权重值可以得到更加符合用户搜索需求的搜索结果。
例如,当文本信息为查询语时,基于实验数据可以得出:特征词为“视频”或者“皮肤”,且该特征词在属于不同意图类别(即分类类别)的查询语中时,该特征词对于用户真正需要获取的内容的重要度不同,特征词“视频”和“皮肤”在属于不同意图类别的查询语中时确定出的词权重值分别如表2和表3所示。
表2特征词“视频”在属于不同意图类别的查询语中时确定出的词权重值
Figure PCTCN2019091387-appb-000001
表3特征词“皮肤”在属于不同意图类别的查询语中时确定出的词权重值
Figure PCTCN2019091387-appb-000002
由表2可知,4个查询语中虽然均包括特征词“视频”,但是4个查询语所属的意图类别是不同的,且特征词“视频”在属于不同意图类别的查询语中时,得到的词权重值不同。且仅在查询语为“视频广告跳过”时,即所属的意图类别为视频类时,特征词“视频”的词权重值最高。若不能识别查询语的意图类别,将使得特征词在属于不同意图类别的查询语中时,具有相同的词权重值。例如,若输入的查询语为“家庭儿歌视频”时,得到的特征词“视频”的词权重值与特征词“视频”在查询语“视频广告跳过”中时的词权重值相同,为0.726,此时,特征词“视频”的词权重值远远大于查询语“家庭儿歌视频”中的其他特征词的词权重值,这将导致输入查询语“家庭儿歌视频”搜索得到的结果大部分为视频类的应用程序,并且由于视频类的应用程序数量较多,而儿歌类的应用程序数量较少,因此很容易将视频类应用程序排在前面展现给用户,使得用户输入查询语“家庭儿歌视频”却不能获取真正需要的儿歌类应用程序。基于表3可以得出与表2一致的结论。
可见,调用文本分析模型,以便通过文本分析模型中的各个判别器对文本信息进行分析识别,可以得到该文本信息的真实意图类别,进而获取在该意图类别下该文本信息中的各个特征词的词权重值,可以提高特征词的词权重值的准确度,基于该特征词的词权重值对文本信息进行搜索,可以得到更加符合用户真实需求的搜索结果。
步骤S404:服务设备将目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值确定为文本信息中的相应特征词的词权重值。文本分析模型的各个判别器是通过特征权重值对文本信息进行分析识别的,通过不同的特征权重值可以将文本信息分类为不同的类别,各个判别器是通过特征权重来区分属于不同分类类别的文本信息的。通过 将目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值确定为文本信息的相应特征词的词权重值,可以使得采用该词权重值对文本信息进行搜索时,可以过滤掉不属于目标分类类别的搜索结果,进而得到属于目标分类类别的搜索结果,也即得到真正符合用户搜索需求的搜索结果。
在一种实现方式中,各个判别器使用的特征权重值可以是各个判别器在训练过程中确定的。在训练过程中,服务设备可以获取大量的历史查询信息,然后对各个历史查询信息进行分词处理,得到各个历史查询信息的特征词,所有历史查询信息的特征词可以组成特征词词典。服务设备可以将特征词词典中的每一个特征词作为一维特征,在确定训练数据时,服务设备可以获取查询信息中的每个特征词的编码,然后根据该查询信息的所有特征词的编码组合,得到该查询信息的特征向量,并将特征向量输入至预设模型进行训练。预设模型中的初始判别器预先为特征词词典中的每一维特征设置了初始特征权重值,初始判别器可以基于为查询信息中的各个特征词设置的初始特征权重值对查询信息进行分析识别。在一种实现方式中,服务设备还可以为特征词词典中的每个词设置唯一的特征标识,在确定训练数据时,服务设备可以基于特征词词典,得到历史查询信息中的每个特征词的特征标识,然后将历史查询信息中的每个特征词的特征标识作为训练数据输入至预设模型进行训练。预设模型中的初始判别器预先为特征词词典中的每一个特征标识设置了初始特征权重值,初始判别器可以基于为查询信息中的各个特征词设置的初始特征权重值对查询信息进行分析识别。通过将查询信息的每个特征词直接作为特征,不用人工设计特征,可以有效降低模型的训练成本。
在一种实现方式中,服务设备基于目标判别器在分析识别时针对查询信息中的各个特征词所使用的特征权重值,确定查询信息中的各个特征词的词权重值的具体实施方式可以为:服务设备判断查询信息的各个特征词是否存在于特征词词典中,若查询信息的第一特征词存在于特征词词典,则获取目标判别器在分析识别时针对第一特征词所使用的特征权重值,并将该特征权重值确定为第一特征词的词权重值;若查询信息的第二特征词不存在于特征词词典,则获取默认值或者第二特征词的逆文本频率指数,并将默认值或者第二特征词的逆文本频率指数确定为第二特征词的词权重值。其中,默认值可以是由服务设备默认设置的,也可以是服务设备根据预先设定的经验值设置的,本申请实施例对此不作限定。需要说明的是,当文本信息为查询信息时,由于查询信息包括的词较少,因此查询信息中的各个特征词在查询信息中的出现频率基本相同,因此词频对于区分不同特征词在查询信息中的重要程度没有帮助,所以当查询信息的第二特征词不存在于特征词词典时,服务设备可以将第二特征词的逆文本频率指数确定为第二特征词的词权重值。
步骤S405:服务设备基于文本信息和该文本信息的各个特征词的词权重值进行信息处理。
为了了解采用图4所述方法得到的词权重值的应用效果,以图4所述方法应用于搜索系统,并且文本信息为查询语进行说明,将基于IDF方法得到的词权重作为对比,对查询语的搜索效果进行了测试,基于图4所述方法得到的词权重值和基于IDF方法得到的词权重值对查询语进行搜索得到的查询结果可以如表4所示。
查询语为“熊出没碰碰汽车”时,用户真正需要的是与熊出没相关的应用程序,由表4 可知,基于图4所述方法得到的词权重值对查询语进行搜索得到的查询结果为与熊出没相关的应用程序,然而,基于IDF方法得到的词权重值对查询语进行搜索得到的查询结果为与碰碰相关的应用程序。因此,采用本申请实施例提出的方法可以有效提高查询结果的准确度。
表4基于图4所述方法得到的词权重值和基于IDF方法得到的词权重值对查询语进行搜索得到的查询结果
Figure PCTCN2019091387-appb-000003
可见,通过实施本申请实施例,不仅可以确定文本信息的真实意图类别,还可以获取在文本信息的真实意图类别下,文本信息的各个特征词的词权重值,基于获取的各个词权重值和文本信息进行信息处理,可以得到更加符合用户需求的信息处理结果。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。
请参见图5,图5是本申请实施例提供的一种信息处理装置的结构示意图,信息处理装置50用于执行图2-图4对应的方法实施例中服务设备所执行的步骤,信息处理装置50可以包括:
获取模块501,用于获取文本信息;
分析模块502,用于调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果;
获取模块501,还用于根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值;
确定模块503,用于基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值。
在一种实现方式中,文本分析模型可以包括判别器,文本分析模型是通过判别器对文本信息进行分析识别的;获取模块501具体用于根据输出结果从文本分析模型包括的判别器中确定出目标判别器,并获取目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值。
在一种实现方式中,文本分析模型可以为分类模型,文本分析模型可以包括多个判别器,每一个判别器可以对应一个分类类别,获取模块501用于根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体用于将与文本分析模型的输出结果包括的目标分类类别对应的判别器确定为目标判别器,其中,目标分类类别是根据文本分析模型的各个判别器对文本信息进行分析后得到的识别结果确定的。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器进行分析识别 的识别结果可以为一个概率值,前述输出结果可以包括目标概率值,目标概率值可以为文本分析模型的各个判别器输出的概率值中的最大概率值;获取模块501用于根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体用于将输出目标概率值的判别器确定为目标判别器。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器可以对应一个标识;获取模块501用于根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体用于将与文本分析模型的输出结果包括的目标标识对应的判别器确定为目标判别器,其中,目标标识是根据文本分析模型的各个判别器对文本信息进行分析识别后得到的识别结果确定的。
在一种实现方式中,确定模块503用于基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值时,具体用于将针对文本信息中的各个特征词所使用的特征权重值作为文本信息中的相应特征词的词权重值。
在一种实现方式中,文本分析模型包括的各个判别器可以用于识别不同分类类别的文本信息,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值不同。
在一种实现方式中,分析模块502具体用于对文本信息进行分词处理,得到文本信息的各个特征词;并将文本信息的各个特征词作为文本分析模型的输入,得到文本分析模型的输出结果。
在一种实现方式中,信息处理装置50还可以包括训练模块504,用于获取训练样本数据,训练样本数据包括历史文本信息和标注信息;并基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型。
需要说明的是,图5对应的实施例中未提及的内容以及各个模块执行步骤的具体实现方式可参见图2-图4所示实施例以及前述内容,这里不再赘述。
在一种实现方式中,图5中的各个模块所实现的相关功能可以结合处理器与网络接口来实现。参见图6,图6是本申请实施例提供的一种服务设备的结构示意图,该服务设备60可以包括网络接口601、处理器602和存储器603,网络接口601、处理器602和存储器603可以通过一条或多条通信总线相互连接,也可以通过其它方式相连接。图6所示的第一处理模块501、第二处理模块502、第三处理模块503和第四处理模块504所实现的相关功能可以通过同一个处理器602来实现,也可以通过多个不同的处理器602来实现。
网络接口601可以用于发送数据和/或信令,以及接收数据和/或信令。应用在本申请实施例中,网络接口601可以用于获取文本信息。
处理器602被配置为执行图2-图4所述方法中服务设备相应的功能。该处理器602可以包括一个或多个处理器,例如该处理器602可以是一个或多个中央处理器(central processing unit,CPU),网络处理器(network processor,NP),硬件芯片或者其任意组合。在处理器602是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
存储器603用于存储程序代码等。存储器603可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器603也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash  memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器603还可以包括上述种类的存储器的组合。
处理器602可以调用存储器603中存储的程序代码以执行以下操作:
获取文本信息;
调用文本分析模型对文本信息进行分析识别,并获取文本分析模型的输出结果;
根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值;
基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值。
在一种实现方式中,文本分析模型可以包括判别器,文本分析模型是通过判别器对文本信息进行分析识别的;处理器602执行根据输出结果,获取文本分析模型在分析识别时针对文本信息中的各个特征词所使用的特征权重值时,具体可以执行以下操作:根据输出结果从文本分析模型包括的判别器中确定出目标判别器,并获取目标判别器在分析识别时针对文本信息中的各个特征词所使用的特征权重值。
在一种实现方式中,前述文本分析模型可以为分类模型,文本分析模型可以包括多个判别器,每一个判别器对应一个分类类别;处理器602执行根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体可以执行以下操作:将与文本分析模型的输出结果包括的目标分类类别对应的判别器确定为目标判别器,其中,目标分类类别是根据文本分析模型的各个判别器对文本信息进行分析后得到的识别结果确定的。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器进行分析识别的识别结果可以为一个概率值,前述输出结果可以包括目标概率值,目标概率值可以为文本分析模型的各个判别器输出的概率值中的最大概率值;处理器602执行根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体可以执行以下操作:将输出目标概率值的判别器确定为目标判别器。
在一种实现方式中,文本分析模型可以包括多个判别器,每一个判别器可以对应一个标识;处理器602执行根据输出结果从文本分析模型包括的判别器中确定出目标判别器时,具体可以执行以下操作:将与文本分析模型的输出结果包括的目标标识对应的判别器确定为目标判别器,其中,目标标识是根据文本分析模型的各个判别器对文本信息进行分析识别后得到的识别结果确定的。
在一种实现方式中,处理器602执行基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值时,具体可以执行以下操作:将针对文本信息中的各个特征词所使用的特征权重值作为文本信息中的相应特征词的词权重值。
在一种实现方式中,文本分析模型包括的各个判别器可以用于识别不同分类类别的文本信息,不同分类类别的文本信息中的同一特征词,在文本分析模型包括的不同判别器中的特征权重值可以不同。
在一种实现方式中,处理器602执行基于获取的各个特征权重值,确定文本信息中的各个特征词的词权重值时,具体可以执行以下操作:对文本信息进行分词处理,得到该文本信息的各个特征词;将该文本信息的各个特征词作为文本分析模型的输入,得到文本分析模型的输出结果。
在一种实现方式中,处理器602还可以执行以下操作:获取训练样本数据,训练样本数据包括历史文本信息和标注信息;并基于历史文本信息和标注信息,对预设模型进行训练,得到前述文本分析模型。
进一步地,处理器602还可以执行图2-图4所示实施例中服务设备对应的操作,具体可参见方法实施例中的描述,在此不再赘述。
本申请实施例还提供一种计算机可读存储介质,可以用于存储图5所示实施例中信息处理装置所用的计算机软件指令,其包含用于执行上述实施例中为服务设备所设计的程序。
上述计算机可读存储介质包括但不限于快闪存储器、硬盘、固态硬盘。
本申请实施例还提供一种计算机程序产品,该计算机产品被计算设备运行时,可以执行上述图2-图4实施例中为服务设备所设计的方法。
在本申请实施例中还提供一种芯片,包括处理器和存储器,该存储器用包括处理器和存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,该计算机程序用于实现上述方法实施例中的方法。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种信息处理方法,其特征在于,所述方法包括:
    获取文本信息;
    调用文本分析模型对所述文本信息进行分析识别,并获取所述文本分析模型的输出结果;
    根据所述输出结果,获取所述文本分析模型在分析识别时针对所述文本信息中的各个特征词所使用的特征权重值;
    基于获取的各个特征权重值,确定所述文本信息中的各个特征词的词权重值。
  2. 根据权利要求1所述的方法,其特征在于,所述文本分析模型包括判别器,所述文本分析模型是通过所述判别器对所述文本信息进行分析识别的;所述根据所述输出结果,获取所述文本分析模型在分析识别时针对所述文本信息中的各个特征词所使用的特征权重值,包括:
    根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器,并获取所述目标判别器在分析识别时针对所述文本信息中的各个特征词所使用的特征权重值。
  3. 根据权利要求2所述的方法,其特征在于,所述文本分析模型为分类模型,所述文本分析模型包括多个判别器,每一个判别器对应一个分类类别;
    所述根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器,包括:
    将与所述文本分析模型的输出结果包括的目标分类类别对应的判别器确定为目标判别器,其中,所述目标分类类别是根据所述文本分析模型的各个判别器对所述文本信息进行分析后得到的识别结果确定的。
  4. 根据权利要求2所述的方法,其特征在于,所述文本分析模型包括多个判别器,每一个判别器进行分析识别的识别结果为一个概率值,所述输出结果包括目标概率值,所述目标概率值为所述文本分析模型的各个判别器输出的概率值中的最大概率值;
    所述根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器,包括:
    将输出所述目标概率值的判别器确定为目标判别器。
  5. 根据权利要求2所述的方法,其特征在于,所述文本分析模型包括多个判别器,每一个判别器对应一个标识;
    所述根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器,包括:
    将与所述文本分析模型的输出结果包括的目标标识对应的判别器确定为目标判别器,其中,所述目标标识是根据所述文本分析模型的各个判别器对所述文本信息进行分析识别后得到的识别结果确定的。
  6. 根据权利要求1~5任一项所述的方法,其特征在于,所述基于获取的各个特征权重值,确定所述文本信息中的各个特征词的词权重值,包括:
    将所述针对所述文本信息中的各个特征词所使用的特征权重值作为所述文本信息中的相应特征词的词权重值。
  7. 根据权利要求3所述的方法,其特征在于,所述文本分析模型包括的各个判别器用于识别不同分类类别的文本信息,不同分类类别的文本信息中的同一特征词,在所述文本分析模型包括的不同判别器中的特征权重值不同。
  8. 根据权利要求1~5任一项所述的方法,其特征在于,所述调用文本分析模型对所述文本信息进行分析识别,并获取所述文本分析模型的输出结果,包括:
    对所述文本信息进行分词处理,得到所述文本信息的各个特征词;
    将所述文本信息的各个特征词作为所述文本分析模型的输入,得到所述文本分析模型的输出结果。
  9. 根据权利要求1~5任一项所述的方法,其特征在于,所述方法还包括:
    获取训练样本数据,所述训练样本数据包括历史文本信息和标注信息;
    基于所述历史文本信息和所述标注信息,对预设模型进行训练,得到所述文本分析模型。
  10. 一种信息处理装置,其特征在于,所述装置包括:
    获取模块,用于获取文本信息;
    分析模块,用于调用文本分析模型对所述文本信息进行分析识别,并获取所述文本分析模型的输出结果;
    所述获取模块,还用于根据所述输出结果,获取所述文本分析模型在分析识别时针对所述文本信息中的各个特征词所使用的特征权重值;
    确定模块,用于基于获取的各个特征权重值,确定所述文本信息中的各个特征词的词权重值。
  11. 根据权利要求10所述的装置,其特征在于,所述文本分析模型包括判别器,所述文本分析模型是通过所述判别器对所述文本信息进行分析识别的;
    所述获取模块具体用于根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器,并获取所述目标判别器在分析识别时针对所述文本信息中的各个特征词所使用的特征权重值。
  12. 根据权利要求11所述的装置,其特征在于,所述文本分析模型为分类模型,所述文本分析模型包括多个判别器,每一个判别器对应一个分类类别;
    所述获取模块用于根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器时,具体用于将与所述文本分析模型的输出结果包括的目标分类类别对应的判别器确定为目标判别器,其中,所述目标分类类别是根据所述文本分析模型的各个判别器对所述文本信息进行分析后得到的识别结果确定的。
  13. 根据权利要求11所述的装置,其特征在于,所述文本分析模型包括多个判别器,每一个判别器进行分析识别的识别结果为一个概率值,所述输出结果包括目标概率值,所述目标概率值为所述文本分析模型的各个判别器输出的概率值中的最大概率值;
    所述获取模块用于根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器时,具体用于将输出所述目标概率值的判别器确定为目标判别器。
  14. 根据权利要求11所述的装置,其特征在于,所述文本分析模型包括多个判别器,每一个判别器对应一个标识;
    所述获取模块用于根据所述输出结果从所述文本分析模型包括的判别器中确定出目标判别器时,具体用于将与所述文本分析模型的输出结果包括的目标标识对应的判别器确定为目标判别器,其中,所述目标标识是根据所述文本分析模型的各个判别器对所述文本信 息进行分析识别后得到的识别结果确定的。
  15. 根据权利要求10~14任一项所述的装置,其特征在于,
    所述确定模块用于基于获取的各个特征权重值,确定所述文本信息中的各个特征词的词权重值时,具体用于将所述针对所述文本信息中的各个特征词所使用的特征权重值作为所述文本信息中的相应特征词的词权重值。
  16. 根据权利要求12所述的装置,其特征在于,所述文本分析模型包括的各个判别器用于识别不同分类类别的文本信息,不同分类类别的文本信息中的同一特征词,在所述文本分析模型包括的不同判别器中的特征权重值不同。
  17. 根据权利要求10~14任一项所述的装置,其特征在于,
    所述分析模块具体用于对所述文本信息进行分词处理,得到所述文本信息的各个特征词;并将所述文本信息的各个特征词作为所述文本分析模型的输入,得到所述文本分析模型的输出结果。
  18. 根据权利要求10~14任一项所述的装置,其特征在于,所述装置还包括训练模块;
    所述训练模块,用于获取训练样本数据,所述训练样本数据包括历史文本信息和标注信息;并基于所述历史文本信息和所述标注信息,对预设模型进行训练,得到所述文本分析模型。
  19. 一种服务设备,其特征在于,包括存储器和处理器,所述存储器中存储有程序指令,所述处理器通过总线与所述存储器连接,所述处理器执行所述存储器中存储的程序指令,以使所述服务设备执行如权利要求1~9任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1~9任一项所述的方法。
PCT/CN2019/091387 2018-11-30 2019-06-14 信息处理方法、装置、服务设备及计算机可读存储介质 WO2020107864A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811464550.3A CN109902154A (zh) 2018-11-30 2018-11-30 信息处理方法、装置、服务设备及计算机可读存储介质
CN201811464550.3 2018-11-30

Publications (1)

Publication Number Publication Date
WO2020107864A1 true WO2020107864A1 (zh) 2020-06-04

Family

ID=66943324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091387 WO2020107864A1 (zh) 2018-11-30 2019-06-14 信息处理方法、装置、服务设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109902154A (zh)
WO (1) WO2020107864A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413737B (zh) * 2019-07-29 2022-10-14 腾讯科技(深圳)有限公司 一种同义词的确定方法、装置、服务器及可读存储介质
CN110737773B (zh) * 2019-10-17 2022-06-10 中国联合网络通信集团有限公司 一种基于神经网络的信息分类方法和系统
CN111260435A (zh) * 2020-01-10 2020-06-09 京东数字科技控股有限公司 多因子权重赋值修正方法、装置、计算机设备和存储介质
CN112667779B (zh) * 2020-12-30 2023-09-05 北京奇艺世纪科技有限公司 一种信息查询方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334768A (zh) * 2008-08-05 2008-12-31 北京学之途网络科技有限公司 一种利用计算机对词义进行排歧的方法、系统及检索方法
CN101385025A (zh) * 2005-12-22 2009-03-11 清晰传媒广告有限公司 通过分析内容确定上下文并且基于该上下文提供相关内容
CN102541958A (zh) * 2010-12-30 2012-07-04 百度在线网络技术(北京)有限公司 一种用于识别短文本类别信息的方法、装置和计算机设备
CN104915356A (zh) * 2014-03-13 2015-09-16 中国移动通信集团上海有限公司 一种文本分类校正方法及装置
CN106557508A (zh) * 2015-09-28 2017-04-05 北京神州泰岳软件股份有限公司 一种文本关键词提取方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101184259B (zh) * 2007-11-01 2010-06-23 浙江大学 垃圾短信中的关键词自动学习及更新方法
CN106156204B (zh) * 2015-04-23 2020-05-29 深圳市腾讯计算机系统有限公司 文本标签的提取方法和装置
US9940323B2 (en) * 2016-07-12 2018-04-10 International Business Machines Corporation Text classifier operation
CN106599933A (zh) * 2016-12-26 2017-04-26 哈尔滨工业大学 一种基于联合深度学习模型的文本情感分类方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101385025A (zh) * 2005-12-22 2009-03-11 清晰传媒广告有限公司 通过分析内容确定上下文并且基于该上下文提供相关内容
CN101334768A (zh) * 2008-08-05 2008-12-31 北京学之途网络科技有限公司 一种利用计算机对词义进行排歧的方法、系统及检索方法
CN102541958A (zh) * 2010-12-30 2012-07-04 百度在线网络技术(北京)有限公司 一种用于识别短文本类别信息的方法、装置和计算机设备
CN104915356A (zh) * 2014-03-13 2015-09-16 中国移动通信集团上海有限公司 一种文本分类校正方法及装置
CN106557508A (zh) * 2015-09-28 2017-04-05 北京神州泰岳软件股份有限公司 一种文本关键词提取方法和装置

Also Published As

Publication number Publication date
CN109902154A (zh) 2019-06-18

Similar Documents

Publication Publication Date Title
WO2020107864A1 (zh) 信息处理方法、装置、服务设备及计算机可读存储介质
US10977447B2 (en) Method and device for identifying a user interest, and computer-readable storage medium
US20230237328A1 (en) Information processing method and terminal, and computer storage medium
WO2020140373A1 (zh) 一种意图识别方法、识别设备及计算机可读存储介质
US8560513B2 (en) Searching for information based on generic attributes of the query
CN108932945B (zh) 一种语音指令的处理方法及装置
WO2020119063A1 (zh) 专家知识推荐方法、装置、计算机设备及存储介质
WO2020087774A1 (zh) 基于概念树的意图识别方法、装置及计算机设备
WO2017097231A1 (zh) 话题处理方法及装置
US10019492B2 (en) Stop word identification method and apparatus
CN112035599B (zh) 基于垂直搜索的查询方法、装置、计算机设备及存储介质
CN107943792B (zh) 一种语句分析方法、装置及终端设备、存储介质
US20140052445A1 (en) Voice search and response based on relevancy
CN111008321A (zh) 基于逻辑回归推荐方法、装置、计算设备、可读存储介质
CN111723192B (zh) 代码推荐方法和装置
CN111581388A (zh) 一种用户意图识别方法、装置及电子设备
KR20130119030A (ko) 유사검색어 추출 시스템 및 방법
CN111930949B (zh) 搜索串处理方法、装置、计算机可读介质及电子设备
US20190236471A1 (en) Identifying Intent in Dialog Data Through Variant Assessment
CN105512270B (zh) 一种确定相关对象的方法和装置
US11403300B2 (en) Method and system for improving relevancy and ranking of search result
TW201435627A (zh) 搜索優化系統及方法
CN113656575B (zh) 训练数据的生成方法、装置、电子设备及可读介质
CN112711678A (zh) 数据解析方法、装置、设备及存储介质
CN115510331A (zh) 一种基于闲置量聚合的共享资源匹配方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889453

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889453

Country of ref document: EP

Kind code of ref document: A1