CN113849729A

CN113849729A - Text data processing method, device and medium

Info

Publication number: CN113849729A
Application number: CN202111027394.6A
Authority: CN
Inventors: 张凯磊; 黄晓烽; 佟娜; 孟莹; 刘智朋
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2021-12-28

Abstract

The embodiment of the application provides a text data processing method, a text data processing device and a text data processing medium, and relates to the technical field of computers and artificial intelligence. The method comprises the following steps: acquiring a search text input by a user to be pushed; when the text type of the search text is detected to belong to the text type limited in a preset word list, acquiring an associated text set associated with the search text, wherein the associated text set comprises an associated text; selecting at least one associated text from the associated text set as a recommended text, and acquiring text attribute information of the recommended text; and pushing the recommended text and the text attribute information for the user to be pushed. The technical scheme of the embodiment of the application can improve the text searching efficiency of the user.

Description

Text data processing method, device and medium

Technical Field

The present application relates to the field of computer and artificial intelligence technologies, and in particular, to a text data processing method, apparatus, and medium.

Background

In a text data processing scenario, such as an application scenario in which search text data input by a user is processed, a search phrase including a keyword is generally obtained according to the keyword in a search text of the user, and the search phrase is displayed to the user for selection by the user. However, in this scheme, when the user needs to search for other text, the user still needs to re-input other text, which makes the text search of the user inefficient. Therefore, how to improve the efficiency of text search of the user is an urgent technical problem to be solved.

Disclosure of Invention

Embodiments of the present application provide a text data processing method, apparatus, computer program product or computer program, and computer readable medium, which can further improve the efficiency of text search of a user to at least a certain extent.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided a text data processing method, including: acquiring a search text input by a user to be pushed; when the text type of the search text is detected to belong to the text type limited in a preset word list, acquiring an associated text set associated with the search text, wherein the associated text set comprises an associated text; selecting at least one associated text from the associated text set as a recommended text, and acquiring text attribute information of the recommended text; and pushing the recommended text and the text attribute information for the user to be pushed.

According to an aspect of an embodiment of the present application, there is provided a text data processing apparatus including: a first acquisition unit used for acquiring a search text input by a user to be pushed; the second acquisition unit is used for acquiring an associated text set associated with the search text when detecting that the text type of the search text belongs to the text type defined in a preset word list, wherein the associated text set comprises associated texts; the selecting unit is used for selecting at least one associated text from the associated text set as a recommended text and acquiring text attribute information of the recommended text; and the pushing unit is used for pushing the recommended text and the text attribute information for the user to be pushed.

In some embodiments of the present application, based on the foregoing solution, the pushing unit is configured to: acquiring a recommendation reference value for each recommended text, wherein the recommendation reference value is used for representing the searched popularity of the recommended text; and displaying the recommended text, and a recommended reference value and text attribute information corresponding to the recommended text in an interface.

In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: identifying a target text type of the search text; and acquiring the associated text associated with the search text based on the target text type to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is further configured to: acquiring a text input by the user to be pushed within a preset historical time interval, and taking the text as a to-be-selected associated text; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is further configured to: acquiring a text similar to the search text in a first text characteristic as a to-be-selected associated text, wherein the first text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is further configured to: acquiring a text input by the user to be pushed in a preset historical time interval and a text similar to the search text in a second text characteristic, wherein the text is used as a related text to be selected, and the second text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit includes: and the filtering unit is used for determining repeated texts in the associated texts to be selected and filtering the repeated texts to obtain the associated texts.

In some embodiments of the present application, based on the foregoing, the selected unit is configured to: sequencing each associated text in the associated text set through a pre-trained sequencing model to obtain reference sequencing information corresponding to each associated text, wherein the reference sequencing information is used for representing the associated degree of the associated text; and selecting at least one associated text from the associated text set as a recommended text based on the reference sorting information.

In some embodiments of the present application, based on the foregoing, the selecting unit is further configured to: acquiring third text characteristics corresponding to the search text and each associated text, wherein the third text characteristics comprise at least one of semantic characteristics, writing form characteristics and pronunciation characteristics; determining recommendation indexes corresponding to the associated texts through the ranking model based on third text characteristics corresponding to the search texts and the associated texts; and sequencing each associated text in the associated text set through the recommendation index.

In some embodiments of the present application, based on the foregoing solution, the ranking model includes a first sub-model and a second sub-model, the first sub-model has an ability to memorize a third text feature corresponding to the search text, the second sub-model has an ability to generalize the third text feature corresponding to the search text, and the selecting unit is further configured to: inputting the search text and the third text characteristics corresponding to each associated text into a first sub-model to output a first recommendation index corresponding to each associated text; inputting the search text and third text characteristics corresponding to each associated text into a second submodel to output a second recommendation index corresponding to each associated text; and performing weighted calculation on the first sub recommendation index and the second recommendation index aiming at each associated text to obtain the recommendation index corresponding to the associated text.

In some embodiments of the present application, based on the foregoing scheme, the first sub-model includes a bilinear transformation model, and the second sub-model is a neural network model.

In some embodiments of the present application, based on the foregoing solution, the apparatus further includes: the training unit is used for acquiring behavior data generated by the user to be pushed according to the pushed recommended text and the text attribute information after the recommended text and the text attribute information are pushed for the user to be pushed; and retraining the ranking model based on the behavior data to obtain a retrained ranking model, wherein the retrained ranking model is used for ranking the next associated text.

In some embodiments of the present application, based on the foregoing solution, the training unit is configured to: determining expected sequencing information corresponding to each associated text based on the behavior data; and correcting hidden layer parameters in the sequencing model through gradient reverse transfer by comparing the expected sequencing information with the reference sequencing information.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the text data processing method as described in the above embodiments.

According to an aspect of the embodiments of the present application, there is provided a text data processing apparatus, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the text data processing method as described in the above embodiments.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the operations performed by the text data processing method as described in the above embodiments.

In the technical solutions provided in some embodiments of the present application, when it is detected that a text type of the input search text of the user belongs to a text type defined in a preset vocabulary, an associated text set associated with the search text is obtained, at least one associated text may be selected from the associated text set as a recommended text, and text attribute information of the recommended text is obtained, so as to push the recommended text and the text attribute information for the user to be pushed. Since the recommended text and the corresponding text attribute information associated with the search text are pushed to the user, the reference for searching other associated texts can be provided for the user, for example, the user can be provided with other Chinese associated with one Chinese while searching the Chinese. Therefore, when the user has the requirement of searching other associated texts, the user does not need to input the search text again, so that the search time of the user is saved, and the text search efficiency of the user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;

FIG. 2 shows a flow diagram of a text data processing method according to one embodiment of the present application;

FIG. 3 shows a detailed flowchart for pushing the recommended text and the text attribute information for the user to be pushed according to an embodiment of the present application;

FIG. 4 shows an interface diagram of an application text data processing method according to an embodiment of the present application;

FIG. 5 illustrates a detailed flow diagram for selecting at least one associated text from the set of associated texts as a recommended text according to an embodiment of the present application;

FIG. 6 is a detailed flowchart illustrating ranking of respective associated texts in the associated text set by a pre-trained ranking model according to an embodiment of the present application;

FIG. 7 illustrates a detailed flow diagram for determining recommendation indices for respective associated texts via the ranking model according to an embodiment of the present application;

FIG. 8 illustrates a model diagram of the ranking model according to one embodiment of the present application;

FIG. 9 shows a flowchart of a method after the recommended text and the text attribute information are pushed for the user to be pushed according to an embodiment of the present application;

FIG. 10 shows a block diagram of a text data processing apparatus according to an embodiment of the present application;

FIG. 11 shows a block diagram of a text data processing device according to an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It is noted that the terms first, second and the like in the description and claims of the present application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or described herein.

Embodiments in the present application relate to techniques related to artificial intelligence, i.e., fully automated processing of data (e.g., text data) is achieved through artificial intelligence. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smart phone 101, the tablet computer 102, and the portable computer 103 shown in fig. 1, and certainly may be a desktop computer, etc., but is not limited thereto, and the present application is not limited thereto), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

In an embodiment of the application, when a user needs to query or search for a text, the user may input a search text on a terminal device, the server 105 acquires the search text input on the terminal device through the network 104, and when it is detected that a text type of the search text belongs to a text type defined in a preset word list, acquires an associated text set including associated texts associated with the search text, then the server 105 selects at least one associated text from the associated text set as a recommended text and acquires text attribute information of the recommended text, and finally, the server 105 pushes the recommended text and the text attribute information for the user to be pushed through the terminal device.

In the implementation, when the relevant data of the search text is provided for the user, the recommended text and the text attribute information which are associated with the search text are pushed for the user, so that more text data can be selected for the user, and the text search efficiency of the user is improved.

It should be noted that the text data processing method provided in the embodiment of the present application may be executed by the server 105, and accordingly, the text data processing apparatus is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the text data processing scheme provided by the embodiments of the present application.

It should also be noted that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. According to implementation needs, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

It should be explained that cloud computing (cloud computing) as described above is a computing model that distributes computing tasks over a large pool of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud can be infinitely expanded to users, and can be acquired at any time, used as required and expanded at any time. The cloud computing resource pool mainly comprises computing equipment (which is a virtualization machine and comprises an operating system), storage equipment and network equipment.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

FIG. 2 shows a flow diagram of a text data processing method according to one embodiment of the present application. Referring to fig. 2, the text data processing method at least includes steps 210 to 270, which are described in detail as follows:

step 210, obtaining a search text input by a user to be pushed.

In this application, the user to be pushed may refer to a user who needs to query information related to a search text, where the search text may be a word, or a sentence. For example, the user inputs a word "pan", and wishes to query attribute information (e.g. the pronunciation, definition, etc. of the word "pan") related to the word "pan". For example, the user inputs a word "assiduously" and wants to search for attribute information related to the word "assiduously" (e.g., explanation of "assiduously", synonyms, etc.), and for example, the user inputs a adage "wanggao sells melon and sells oneself", and wants to search for attribute information related to adage (e.g., the cause of the adage, etc.). Step 230, when it is detected that the text type of the search text belongs to the text type defined in the preset word list, acquiring an associated text set associated with the search text, where the associated text set includes associated texts.

In the application, the text data processing method can be applied to a search scene of a Chinese text, for example, a user needs to search a certain word, or a certain search idiom, or a search scene of a certain proverbial or poem.

Specifically, a preset vocabulary may be constructed in advance, and the preset vocabulary stores previously collected chinese of various text types, such as a single-word type chinese text, a idiom type chinese text, a adage or poem type chinese text, and the like. It is understood that the text type defined by the preset vocabulary may include at least one of a single word type, a idiom type, a proverb type, and a poetry type.

Therefore, when the text type of the search text is detected to belong to the text type defined in the preset word list, the associated text set associated with the search text is obtained.

In this application, the associated text in the associated text set refers to a text having an association relationship with the search text, for example, a similar association relationship in semantics exists, for example, an association relationship in an input time sequence exists, and for example, a similar association relationship in speech exists.

Step 250, selecting at least one associated text from the associated text set as a recommended text, and acquiring text attribute information of the recommended text.

It can be understood that the number of the associated texts in the associated text set is multiple, for this reason, a part of the associated texts in the associated text set may be selected as the recommended texts, or all the associated texts may be used as the recommended texts.

In the application, the type of the text attribute information of the recommended text may be the same as the type of the text attribute information of the search text input by the user, for example, if the text attribute information of the search text "pan" input by the user is a sound and a paraphrase of "pan", the text attribute information of the recommended text is also a sound and a paraphrase.

In the application, the type of the text attribute information of the search text can be determined by identifying the search intention of the user, for example, the user inputs the search text "pan", and the search intention of the user can be identified as the pronunciation and the paraphrase of the query "pan" according to the historical search habit of the user.

And 270, pushing the recommended text and the text attribute information for the user to be pushed.

In step 270, the recommended text and the text attribute information are pushed for the user to be pushed, which may be performed according to the steps shown in fig. 3.

Referring to fig. 3, a detailed flowchart for pushing the recommended text and the text attribute information for the user to be pushed according to an embodiment of the present application is shown. Specifically, the method includes steps 271 to 272:

step 271, obtaining a recommendation reference value for each recommended text, where the recommendation reference value is used to represent the searched popularity of the recommended text.

And 272, displaying the recommended text, and the recommended reference value and text attribute information corresponding to the recommended text in an interface.

In order to make the present application better understood by those skilled in the art, a specific application scenario will be described below with reference to fig. 4.

Referring to fig. 4, an interface diagram of an application text data processing method according to an embodiment of the present application is shown.

As shown in fig. 4, after the user inputs a search text 402 "long" word in the input box of the interface 401, the interface 401 displays the text attribute information related to the text "long" word, and displays the associated text 403 "normal", "taste", "factory", and "field" words of the "long" word, and displays the text attribute information corresponding to the "normal", "taste", "factory", and "field" words. Further, recommended reference values corresponding to the respective associated texts are displayed, for example, the recommended reference values corresponding to the "normal" word and the "taste" word are 92%, and the recommended reference values corresponding to the "factory" word and the "field" word are 88%.

In the method and the device, the recommended text, the corresponding recommended reference value and the text attribute information are displayed while the information related to the text searched by the user is displayed in the interface, so that the reference for searching other text information can be provided for the user, the search text does not need to be input again when the user needs to search other related texts, the search time of the user is saved, and the text search efficiency of the user is improved.

The above text data processing scheme will be specifically explained by further embodiments.

In one embodiment of step 210 shown in fig. 2, obtaining the associated text set associated with the search text may be performed according to the following steps:

step 1, identifying a target text type of the search text.

And 2, acquiring the associated text associated with the search text based on the target text type to obtain the associated text set.

In the present embodiment, the text type may include a single character type, a word type, a idiom type, a sentence type, and the like.

In this embodiment, based on the target text type, the associated text associated with the search text is obtained, for example, if the search text type is a word type, the text type of the associated text is also a word type. By keeping the type of the associated text consistent with the type of the search text, the high-degree association between the associated text and the search text can be ensured, and the text search efficiency of the user can be improved in the subsequent process.

In another embodiment of step 210 shown in fig. 2, obtaining the associated text set associated with the search text may be performed according to the following steps:

step 1, acquiring a text input by the user to be pushed within a preset historical time interval, and taking the text as a to-be-selected associated text.

And 2, determining the associated text from the associated texts to be selected to obtain the associated text set.

In this embodiment, the predetermined historical time interval may be within the last one minute or within the last thirty seconds, for example, if the user inputs a "long" word in the input box of the interface search engine to inquire about the pronunciation and the paraphrase of the "long" word, and if the user inputs "heavy", "normal", "several pronunciations in a line" and "what meaning is raised" word in the last one minute, the input texts may be obtained from the operation log records, and the key texts may be determined from the input texts. For example, for an input text of "line has several pronunciations", the keyword "line" may be acquired as the associated text, and for example, for an input text of "what means" the keyword "rises", the keyword "rises" may be acquired as the associated text. Finally, the "heavy", "normal", "line" and "rising" are used as the associated text of the "long", and an associated text set composed of the "heavy", "normal", "medium" and "rising" is obtained.

It should be noted that the predetermined historical time interval may be determined according to actual needs, and is not limited to those listed above.

In this embodiment, since texts input by the user before and after a certain period of time generally have a certain correlation, the text input by the user in a predetermined historical time interval is used as the associated text to be selected, which can ensure a high degree of correlation between the associated text and the search text, and further improve the text search efficiency of the user in the subsequent process.

step 1, obtaining a text similar to the search text in a first text characteristic as a to-be-selected associated text, wherein the first text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic.

In this embodiment, the associated text to be selected may be determined by determining the similarity between the text and the search text on the first text feature, that is, the text with the similarity exceeding the similarity threshold may be used as the associated text to be selected. For example, if the similarity threshold is set to 70%, and the similarity between the text "normal" and the search text "long" on the pronunciation feature is 100%, the text "normal" may be determined as the candidate associated text of the search text "long". If the similarity of the text "high" and the search text "long" on the semantic features is 71%, the text "high" may also be determined as the associated text to be selected of the search text "long".

In this embodiment, a text library may be established for each text, where the text library includes one or more texts similar to the corresponding text, for example, a near-meaning text including a search text, an anti-sense text, and the like, and when it is detected that a user inputs a text, the text in the text corresponding to the similar text library may be used as the candidate associated text of the text.

In this embodiment, the text similar to the search text in the first text feature is used as the associated text to be selected, so that the high degree of association between the associated text and the search text can be ensured, and the text search efficiency of the user can be improved in the subsequent process.

step 1, acquiring a text input by the user to be pushed in a preset historical time interval and a text similar to the search text in a second text characteristic as a related text to be selected, wherein the second text characteristic comprises at least one of a semantic characteristic, a writing shape characteristic and a pronunciation characteristic.

In this embodiment, the text input by the user within the predetermined historical time interval to be pushed and the text similar to the search text in the second text feature are simultaneously used as the associated text to be selected, so that the number of the associated texts can be expanded, and the text search efficiency of the user can be further improved in the subsequent text data processing process.

In step 2 in the above several embodiments, the association text is determined from the association text to be selected, which may be determining repeated texts in the association text to be selected first, and filtering the repeated texts to obtain the association text.

In this embodiment, because there may be multiple repeated texts in the associated text to be selected, and directly processing the associated text to be selected increases the text processing load, filtering repeated texts in the associated text to be selected can reduce the text processing load and save computer resources.

In one embodiment of step 250 shown in fig. 2, selecting at least one associated text from the associated text set as the recommended text may be performed according to the steps shown in fig. 5.

Referring to fig. 5, a detailed flow diagram for selecting at least one associated text from the associated text set as a recommended text according to an embodiment of the application is shown. Specifically, the method comprises the following steps 251 and 252:

and 251, ranking each associated text in the associated text set through a pre-trained ranking model to obtain reference ranking information corresponding to each associated text, wherein the reference ranking information is used for representing the associated degree of the associated text.

Step 252, based on the reference sorting information, selecting at least one associated text from the associated text set as a recommended text.

In the present application, the pre-trained ranking model has the ability to determine the degree of association between the associated text and the search text. Wherein the degree of association with which the associated text is associated can be characterized by referring to the ranking information.

In this application, the reference ranking information may include a ranking number, for example, a text with the top ranking number indicates a high degree of association, and the ranking information may also include a ranking weight, wherein a higher ranking weight of a text indicates a higher degree of association of the text.

In this application, at least one associated text is selected from the associated text set as a recommended text based on the reference sorting information, and at least one associated text with a higher association degree is selected from the associated text set as a recommended text according to the reference sorting information.

In the method, the texts can be ranked according to the click rate and the search quantity of each search text or recommended text historically, the statistical characteristics, the text characteristics and the keyword characteristics of each text are mined, the ranking model is trained based on the statistical characteristics, the text characteristics and the keyword characteristics of each text, and the trained ranking model is finally applied to ranking each associated text;

in one embodiment of step 251 shown in fig. 5, the step of sorting each associated text in the associated text set by a pre-trained sorting model may be performed according to the step shown in fig. 6.

Referring to fig. 6, a detailed flowchart of ranking each associated text in the associated text set through a pre-trained ranking model according to an embodiment of the present application is shown. Specifically, step 2511, step 2513:

step 2511, third text features corresponding to the search text and each associated text are obtained, wherein the third text features comprise at least one of semantic features, writing form features and pronunciation features.

Step 2512, based on the third text characteristics corresponding to the search text and each associated text, determining a recommendation index corresponding to each associated text through the ranking model.

Step 2513, sorting each associated text in the associated text set according to the recommendation index.

In the present application, the third text feature may include at least one of a semantic feature, a written form feature, and a pronunciation feature, and the text semantic feature, the written form feature, and the pronunciation feature may be represented by a form of a feature vector, different text features of different texts correspond to different feature vectors, for example, a semantic feature of text "long" may be represented by a feature vector "[ 24,12, 34,31,10,31,45,30,10,12,24,5,9,24,12,24 ]" and a pronunciation feature of text "long" may be represented by a feature vector "[ 41, 2, 5,71,1,47,14,7,8,24,77,31,4,7,55,23 ].

In the present application, the recommendation index determined by the ranking model may represent the degree to which the associated text is associated.

In the present application, the ranking model may be trained based on a single model, for example, the ranking model may be trained based on a machine learning model.

In the present application, the ranking model may also be obtained by training based on a combination model of a plurality of single models. For example, the ranking model disclosed in the present application may include a first sub-model and a second sub-model, where the first sub-model has a capability of memorizing a third text feature corresponding to the search text, and the second sub-model has a capability of generalizing the third text feature corresponding to the search text.

Based on this, in an embodiment of step 2512 shown in fig. 6, determining recommendation indexes corresponding to respective associated texts through the ranking model based on third text features of the search texts corresponding to the respective associated texts may be performed according to the steps shown in fig. 7.

Referring to fig. 7, a detailed flow chart for determining recommendation indexes corresponding to respective associated texts through the ranking model according to an embodiment of the application is shown. Specifically, step 25121, step 25123:

step 25121, inputting the search text and the third text feature corresponding to each associated text into the first sub-model to output a first recommendation index corresponding to each associated text.

And 25122, inputting the third text characteristics corresponding to the search text and each associated text into a second submodel to output a second recommendation index corresponding to each associated text.

Step 25123, for each associated text, performing weighted calculation on the first sub-recommendation index and the second recommendation index to obtain a recommendation index corresponding to the associated text.

In order to make the ranking model better understood by those skilled in the art, the proposed ranking model in the present embodiment will be briefly described below with reference to fig. 8:

referring to FIG. 8, a model diagram of the ranking model according to one embodiment of the present application is shown.

As shown in FIG. 8, the order model may include an input layer, a hidden layer, and an output layer. In the input layer, the third text feature 801 of the search text "X" and the third text feature 804 of the associated text "a, b, c, d, e" are input to the hidden layers of the first sub-model 802 and the second sub-model 805. The hidden layers of the first submodel 802 and the second submodel 805 respectively learn the search text and the third text characteristics corresponding to each associated text, respectively output a first recommendation index 803 and a second recommendation index 806 corresponding to the associated text "a, b, c, d, e" on the output layer, and finally perform weighted calculation on the first recommendation index 803 and the second recommendation index 806 to obtain a recommendation index 807 corresponding to the associated text "a, b, c, d, e".

In this embodiment, the first sub-model may include a bilinear transformation model.

In this embodiment, the second sub-model may include a neural network model, and specifically, the neural network model may include a plurality of hidden layers, each layer is fully connected, and the activation functions are Relu activation functions.

In the method, the sequencing model is formed by the first submodel with memory capacity and the second submodel with generalization capacity, so that the sequencing model has memory capacity and generalization capacity at the same time, the output result of the sequencing model has accuracy and expansibility at the same time, and the text search efficiency of a user is further improved in the subsequent text data processing process.

In this application, after the recommended text and the text attribute information are pushed for the user to be pushed, the steps shown in fig. 9 may also be performed.

Referring to fig. 9, a flowchart of a method after pushing the recommended text and the text attribute information for the user to be pushed according to an embodiment of the present application is shown. Specifically, the method comprises steps 281 and 282:

step 281, acquiring behavior data generated by the user to be pushed according to the pushed recommended text and the text attribute information.

And 282, retraining the ranking model based on the behavior data to obtain a retrained ranking model, wherein the retrained ranking model is used for ranking the next associated text.

In one embodiment, such as step 282 shown in FIG. 9, retraining the ranking model based on the behavior data may perform the following steps:

and step 1, determining expected sequencing information corresponding to each associated text based on the behavior data.

And 2, correcting hidden layer parameters in the sequencing model through gradient reverse transfer by comparing the expected sequencing information with the reference sequencing information.

In this application, the behavior data may refer to data generated by a user performing a click behavior with respect to the recommended text and the text attribute information, and the desired ranking information is determined based on the behavior data of the user. For example, referring to fig. 4, the user clicks the "field" word and the text attribute information corresponding to the "field" word, and therefore, it can be determined that the text attribute information corresponding to the "field" word has a higher desired rank (for example, the rank is first in each recommended text).

In the present application, please refer to fig. 4, for example, the expected ranking information corresponding to each associated text determined based on the behavior data should be the ranking order of "field", "normal", "taste", and "factory", that is, the text attribute information corresponding to the "field" word and the "field" word should be recommended first, and the reference ranking information determined by the ranking model is the ranking order of "normal", "taste", "factory", and "field", so that it can be seen that the expected ranking information is different from the reference ranking information.

Based on the difference, the hidden layer parameters in the ranking model are corrected through gradient reverse transfer to obtain a retrained ranking model.

In the application, after the ranking model is applied to ranking each associated text, the ranking model is trained through the generated behavior data, so that parameters in the ranking model are continuously optimized and updated, and therefore more appropriate ranking information can be determined through the ranking model when each associated text is ranked next time. Therefore, the method and the device can improve the accuracy of sequencing the associated texts to a certain extent, so that the text search efficiency of the user is improved.

According to the technical scheme, when the text type of the input search text of the user is detected to belong to the text type limited in the preset word list, the associated text set associated with the search text is obtained, at least one associated text can be selected from the associated text set to serve as a recommended text, and text attribute information of the recommended text is obtained, so that the recommended text and the text attribute information are pushed for the user to be pushed. The recommended text and the corresponding text attribute information which are associated with the search text are pushed for the user, so that the reference for searching other associated texts can be provided for the user, the search text does not need to be input again when the user needs to search other associated texts, the search time of the user is saved, and the text search efficiency of the user is improved.

The following describes embodiments of an apparatus of the present application, which may be used to perform the text data processing method in the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the text data processing method described above in the present application.

FIG. 10 shows a block diagram of a text data processing device according to an embodiment of the present application.

Referring to fig. 10, a text data processing apparatus 1000 according to an embodiment of the present application includes: a first acquiring unit 1001, a second acquiring unit 1002, a selecting unit 1003 and a pushing unit 1004.

A first obtaining unit 1001 configured to obtain a search text input by a user to be pushed; a second obtaining unit 1002, configured to obtain an associated text set associated with the search text when detecting that a text type of the search text belongs to a text type defined in a preset vocabulary, where the associated text set includes associated texts; a selecting unit 1003, configured to select at least one associated text from the associated text set as a recommended text, and acquire text attribute information of the recommended text; a pushing unit 1004, configured to push the recommended text and the text attribute information for the user to be pushed.

In some embodiments of the present application, based on the foregoing solution, the pushing unit 1004 is configured to: acquiring a recommendation reference value for each recommended text, wherein the recommendation reference value is used for representing the searched popularity of the recommended text; and displaying the recommended text, and a recommended reference value and text attribute information corresponding to the recommended text in an interface.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1002 is configured to: identifying a target text type of the search text; and acquiring the associated text associated with the search text based on the target text type to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1002 is further configured to: acquiring a text input by the user to be pushed within a preset historical time interval, and taking the text as a to-be-selected associated text; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1002 is further configured to: acquiring a text similar to the search text in a first text characteristic as a to-be-selected associated text, wherein the first text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1002 is further configured to: acquiring a text input by the user to be pushed in a preset historical time interval and a text similar to the search text in a second text characteristic, wherein the text is used as a related text to be selected, and the second text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic; and determining the associated text from the associated texts to be selected to obtain the associated text set.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1002 includes: and the filtering unit is used for determining repeated texts in the associated texts to be selected and filtering the repeated texts to obtain the associated texts.

In some embodiments of the present application, based on the foregoing scheme, the selecting unit 1003 is configured to: sequencing each associated text in the associated text set through a pre-trained sequencing model to obtain reference sequencing information corresponding to each associated text, wherein the reference sequencing information is used for representing the associated degree of the associated text; and selecting at least one associated text from the associated text set as a recommended text based on the reference sorting information.

In some embodiments of the present application, based on the foregoing solution, the selecting unit 1003 is further configured to: acquiring third text characteristics corresponding to the search text and each associated text, wherein the third text characteristics comprise at least one of semantic characteristics, writing form characteristics and pronunciation characteristics; determining recommendation indexes corresponding to the associated texts through the ranking model based on third text characteristics corresponding to the search texts and the associated texts; and sequencing each associated text in the associated text set through the recommendation index.

In some embodiments of the present application, based on the foregoing scheme, the ranking model includes a first sub-model and a second sub-model, the first sub-model has an ability to memorize a third text feature corresponding to the search text, the second sub-model has an ability to generalize the third text feature corresponding to the search text, and the selecting unit 1003 is further configured to: inputting the search text and the third text characteristics corresponding to each associated text into a first sub-model to output a first recommendation index corresponding to each associated text; inputting the search text and third text characteristics corresponding to each associated text into a second submodel to output a second recommendation index corresponding to each associated text; and performing weighted calculation on the first sub recommendation index and the second recommendation index aiming at each associated text to obtain the recommendation index corresponding to the associated text.

As another aspect, the present application provides another text data processing apparatus, which includes a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the text data processing method as described in the foregoing embodiments.

FIG. 11 shows a block diagram of a text data processing device according to an embodiment of the present application. For example, the apparatus 1100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, apparatus 1100 may include one or more of the following components: processing component 1102, memory 1104, power component 1106, multimedia component 1108, audio component 1110, input/output (I/O) interface 1112, sensor component 1114, and communications component 1116.

The processing component 1102 generally controls the overall operation of the device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 1102 may include one or more processors 1120 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1102 may include one or more modules that facilitate interaction between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.

The memory 1104 is configured to store various types of data to support operation at the device 1100. Examples of such data include instructions for any application or method operating on device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A power component 1106 provides power to the various components of the device 1100. The power components 1106 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 1100.

The multimedia component 1108 includes a screen that provides an output interface between the device 1100 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1108 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 1100 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1110 is configured to output and/or input audio signals. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1100 is in operating modes, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 1104 or transmitted via the communication component 1116. In some embodiments, the audio assembly 1110 further includes a speaker for outputting audio signals.

The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1114 includes one or more sensors for providing various aspects of state assessment for the apparatus 1100. For example, the sensor component 1114 may detect the open/closed status of the device 1100, the relative positioning of components, such as a display and keypad of the apparatus 1100, the sensor component 1114 may also search for results that show a change in the position of the apparatus 1100 or a component of the apparatus 1100, the presence or absence of user contact with the apparatus 1100, orientation or acceleration/deceleration of the apparatus 1100, and a change in the temperature of the apparatus 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1116 is configured to facilitate wired or wireless communication between the apparatus 1100 and other devices. The apparatus 1100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1116 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1116 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as memory 1104 comprising instructions, executable by processor 1120 of apparatus 1100 to perform the text data processing method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

As another aspect, the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the text data processing method described in the above embodiments.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer-readable storage medium has stored therein at least one program code, which is loaded and executed by a processor of the apparatus to implement the operations performed by the text data processing method as described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of processing text data, the method comprising:

acquiring a search text input by a user to be pushed;

when the text type of the search text is detected to belong to the text type limited in a preset word list, acquiring an associated text set associated with the search text, wherein the associated text set comprises an associated text;

selecting at least one associated text from the associated text set as a recommended text, and acquiring text attribute information of the recommended text;

and pushing the recommended text and the text attribute information for the user to be pushed.

2. The method according to claim 1, wherein the pushing the recommended text and the text attribute information for the user to be pushed comprises:

acquiring a recommendation reference value for each recommended text, wherein the recommendation reference value is used for representing the searched popularity of the recommended text;

and displaying the recommended text, and a recommended reference value and text attribute information corresponding to the recommended text in an interface.

3. The method of claim 1, wherein obtaining the associated text set associated with the search text comprises:

identifying a target text type of the search text;

and acquiring the associated text associated with the search text based on the target text type to obtain the associated text set.

4. The method of claim 1, wherein obtaining the associated text set associated with the search text comprises:

acquiring a to-be-selected associated text, and determining the associated text from the to-be-selected associated text to obtain an associated text set;

the acquiring of the to-be-selected associated text includes any one of the following:

acquiring a text input by the user to be pushed within a preset historical time interval, and taking the text as a to-be-selected associated text;

acquiring a text similar to the search text in a first text characteristic as a to-be-selected associated text, wherein the first text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic;

and acquiring the text input by the user to be pushed in a preset historical time interval and the text similar to the search text in a second text characteristic as the associated text to be selected, wherein the second text characteristic comprises at least one of a semantic characteristic, a writing form characteristic and a pronunciation characteristic.

5. The method of claim 1, wherein the selecting at least one associated text from the associated text set as a recommended text comprises:

sequencing each associated text in the associated text set through a pre-trained sequencing model to obtain reference sequencing information corresponding to each associated text, wherein the reference sequencing information is used for representing the associated degree of the associated text;

and selecting at least one associated text from the associated text set as a recommended text based on the reference sorting information.

6. The method of claim 5, wherein ranking each associated text in the set of associated texts through a pre-trained ranking model comprises:

acquiring third text characteristics corresponding to the search text and each associated text, wherein the third text characteristics comprise at least one of semantic characteristics, writing form characteristics and pronunciation characteristics;

determining recommendation indexes corresponding to the associated texts through the ranking model based on third text characteristics corresponding to the search texts and the associated texts;

and sequencing each associated text in the associated text set through the recommendation index.

7. The method of claim 6, wherein the ranking model comprises a first sub-model and a second sub-model, the first sub-model having a capability of memorizing third text features corresponding to the search text, the second sub-model having a capability of generalizing the third text features corresponding to the search text, and wherein determining recommendation indexes corresponding to the respective associated texts through the ranking model based on the third text features corresponding to the search text and the respective associated texts comprises:

inputting the search text and the third text characteristics corresponding to each associated text into a first sub-model to output a first recommendation index corresponding to each associated text;

inputting the search text and third text characteristics corresponding to each associated text into a second submodel to output a second recommendation index corresponding to each associated text;

and performing weighted calculation on the first sub recommendation index and the second recommendation index aiming at each associated text to obtain the recommendation index corresponding to the associated text.

8. The method according to any one of claims 5 to 7, wherein after the recommended text and the text attribute information are pushed for the user to be pushed, the method further comprises:

acquiring behavior data generated by a user to be pushed according to the pushed recommended text and the text attribute information;

and retraining the ranking model based on the behavior data to obtain a retrained ranking model, wherein the retrained ranking model is used for ranking the next associated text.

9. The method of claim 8, wherein the retraining the ranking model based on the behavior data comprises:

determining expected sequencing information corresponding to each associated text based on the behavior data;

and correcting hidden layer parameters in the sequencing model through gradient reverse transfer by comparing the expected sequencing information with the reference sequencing information.

10. A text data processing apparatus, characterized in that the apparatus comprises:

a first acquisition unit used for acquiring a search text input by a user to be pushed;

the second acquisition unit is used for acquiring an associated text set associated with the search text when detecting that the text type of the search text belongs to the text type defined in a preset word list, wherein the associated text set comprises associated texts;

the selecting unit is used for selecting at least one associated text from the associated text set as a recommended text and acquiring text attribute information of the recommended text;

and the pushing unit is used for pushing the recommended text and the text attribute information for the user to be pushed.

11. A text data processing apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the text data processing method according to any one of claims 1 to 9.

12. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded into and executed by a processor to perform operations performed by the text data processing method according to any one of claims 1 to 9.