CN111476037B - Text processing method and device, computer equipment and storage medium - Google Patents

Text processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111476037B
CN111476037B CN202010289730.3A CN202010289730A CN111476037B CN 111476037 B CN111476037 B CN 111476037B CN 202010289730 A CN202010289730 A CN 202010289730A CN 111476037 B CN111476037 B CN 111476037B
Authority
CN
China
Prior art keywords
text
target
corrosion
relevant
structural element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010289730.3A
Other languages
Chinese (zh)
Other versions
CN111476037A (en
Inventor
赵琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010289730.3A priority Critical patent/CN111476037B/en
Publication of CN111476037A publication Critical patent/CN111476037A/en
Application granted granted Critical
Publication of CN111476037B publication Critical patent/CN111476037B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The application relates to a text processing method, a text processing device, computer equipment and a storage medium, which can be used for processing big data, accurately recommending based on a user portrait and semantic related texts obtained by processing, and accurately recommending based on an artificial intelligence model. The method comprises the following steps: acquiring an initial related text corresponding to a target object; obtaining target corrosion structural elements; acquiring a target text element in the initial relevant text according to the position of a corrosion reference point corresponding to the target corrosion structural element; comparing the target corrosion structure element with the target text element to obtain a target comparison result; and when the target comparison result is consistent, carrying out corrosion treatment on the initial relevant text to obtain a semantic relevant text corresponding to the target object. By adopting the method, the text processing efficiency can be improved.

Description

Text processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a text processing method and apparatus, a computer device, and a storage medium.
Background
With the development of information technology, more and more objects in the internet are needed, and in many scenes, a text corresponding to an object needs to be obtained, for example, a label of the object can be obtained according to the text corresponding to the object, so as to manage the object according to the label of the object. The object may be, for example, an application or a video, and the classification management of the object may be performed according to the category of the application or the video.
The text data corresponding to the object often includes useless information, and at present, the text data is usually processed manually, which results in low text processing efficiency.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a text processing method, apparatus, computer device and storage medium for solving the above technical problems.
A method of text processing, the method comprising: acquiring an initial relevant text corresponding to a target object; obtaining target corrosion structural elements; acquiring a target text element in the initial relevant text according to the position of a corrosion reference point corresponding to the target corrosion structural element; comparing the target corrosion structure element with the target text element to obtain a target comparison result; and when the target comparison result is consistent, performing corrosion processing on the initial relevant text to obtain a semantic relevant text corresponding to the target object.
A text processing apparatus, the apparatus comprising: the initial relevant text acquisition module is used for acquiring an initial relevant text corresponding to the target object; the target corrosion structural element acquisition module is used for acquiring a target corrosion structural element; a target text element obtaining module, configured to obtain a target text element in the initial relevant text according to a position of a corrosion reference point corresponding to the target corrosion structural element; the comparison module is used for comparing the target corrosion structure element with the target text element to obtain a target comparison result; and the corrosion module is used for corroding the initial relevant text to obtain a semantic relevant text corresponding to the target object when the target comparison result is consistent.
In some embodiments, the target text element obtaining module comprises: a current character determining unit, configured to determine a current character in the relevant text to be corroded, and use a position of the current character as a position of a corrosion reference point corresponding to the target corrosion structural element; when the initial relevant text is subjected to primary corrosion, the initial relevant text is taken as a relevant text to be corroded, and when the initial relevant text is not subjected to primary corrosion, the relevant text obtained by the last corrosion is taken as the relevant text to be corroded; and the target text element acquisition unit is used for acquiring a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element according to the relative position relationship between the target corrosion structural element and the corrosion reference point.
In some embodiments, the target text element obtaining unit is configured to: and when the relative position relation is that the corrosion reference point is behind the target corrosion structural element, acquiring a head text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
In some embodiments, the target text element obtaining unit is configured to: and when the relative position relation is that the corrosion reference point is in front of the target corrosion structural element, acquiring a tail text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
In some embodiments, the target text element obtaining unit is configured to: when the relative position relation is that the corrosion reference point is in the target corrosion structural element, taking a text element positioned in front of the corrosion reference point in the target corrosion structural element as a first structural text element, and taking a text element positioned in back of the corrosion reference point in the target corrosion structural element as a second structural text element; acquiring a head text element corresponding to the position of the first structural text element and a tail text element corresponding to the position of the second structural text element from the to-be-corroded related text, and taking the head text element and the tail text element as the target text elements; the comparison module is used for: comparing the head text element with the first structure text element to obtain a first comparison result, and comparing the tail text element with the second structure text element to obtain a second comparison result; and obtaining the target comparison result according to the first comparison result and the second comparison result.
In some embodiments, the targeted corrosion structural element acquisition module is to: obtaining target corrosion structural elements from the corrosion structural element set; the corrosion module is to: when the target comparison result is consistent, carrying out corrosion treatment on the relevant text to be corroded, and obtaining the current relevant text; when the initial relevant text is corroded for the first time, taking the initial relevant text as the relevant text to be corroded; and taking the current relevant text as the relevant text to be corroded, entering a step of acquiring a target corrosion structural element from a corrosion structural element set until the current relevant text is converged, and taking the converged current relevant text as the semantic relevant text corresponding to the target object.
In some embodiments, the module for determining the current relevant text convergence is for: in the current round of corrosion, a relevant text obtained by corrosion treatment by using the last corrosion structural element in the corrosion structural element set is used as the relevant text obtained by the current round of corrosion; and comparing the related text obtained by the corrosion of the current round with the related text obtained by the corrosion of the previous round, and determining that the current related text is converged when the comparison is consistent.
In some embodiments, the apparatus further comprises: a relevant text set acquisition module used for acquiring a relevant text set; a candidate structural element set acquisition module, configured to acquire text elements in the relevant text set to form a candidate structural element set; the importance acquisition module is used for acquiring the importance of each candidate structural element in the candidate structural element set in the relevant text set; the screening module is used for screening candidate structural elements meeting the importance degree condition from the candidate structural element set according to the importance degree of the candidate structural elements to form an intermediate structural element set, and the intermediate structural element set is used for determining the target corrosion structural elements; the importance condition includes at least one of an importance being less than a first importance or an importance ranking being lower than the first ranking.
In some embodiments, the target object comprises a target application, and the relevant text collection acquisition module is configured to: acquiring application program package names respectively corresponding to application programs in an application program set to form a related text set; the initial relevant text acquisition module is used for: taking an application program of which the label is to be determined as a target application program, and acquiring an application program package name corresponding to the target application program as an initial related text corresponding to the target application program.
In some embodiments, the apparatus further comprises: and the object label acquisition module is used for acquiring the target semantics corresponding to the semantic related text and acquiring the object label of the target object according to the target semantics.
In some embodiments, the target object includes a target application, the object tag corresponding to the target object includes a program tag corresponding to the target application, and the apparatus further includes: the push application program determining module is used for taking the application program corresponding to the program tag as a push application program; and the pushing module is used for determining a target terminal provided with the target application program and pushing the program related information corresponding to the pushed application program to the target terminal.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program: acquiring an initial related text corresponding to a target object; obtaining target corrosion structural elements; acquiring target text elements in the initial relevant text according to the positions of corrosion reference points corresponding to the target corrosion structural elements; comparing the target corrosion structure element with the target text element to obtain a target comparison result; and when the target comparison result is consistent, performing corrosion processing on the initial relevant text to obtain a semantic relevant text corresponding to the target object.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of: acquiring an initial relevant text corresponding to a target object; obtaining target corrosion structural elements; acquiring a target text element in the initial relevant text according to the position of a corrosion reference point corresponding to the target corrosion structural element; comparing the target corrosion structure element with the target text element to obtain a target comparison result; and when the target comparison result is consistent, carrying out corrosion treatment on the initial relevant text to obtain a semantic relevant text corresponding to the target object.
According to the text processing method, the text processing device, the computer equipment and the storage medium, the target corrosion structure element can be obtained for the initial relevant text corresponding to the target object, the target text element in the initial relevant text is obtained according to the position of the corrosion reference point corresponding to the target corrosion structure element, the target corrosion structure element is compared with the target text element to obtain a target comparison result, and when the target comparison result is consistent, the initial relevant text is subjected to corrosion processing to obtain the semantic relevant text corresponding to the target object. During corrosion treatment, the target text element in the initial relevant text is obtained according to the position of the corrosion reference point corresponding to the target corrosion structure element, so that the text element to be corroded can be accurately obtained. By removing the text elements consistent with the corrosion structure elements, irrelevant texts in the initial relevant texts corresponding to the target object can be removed, and semantic relevant texts with semantic value are obtained, so that the efficiency and the accuracy of text processing are improved.
Drawings
FIG. 1 is a diagram of an application environment of a text processing method in some embodiments;
FIG. 2 is a flow diagram that illustrates a method for text processing in some embodiments;
FIG. 3 is a schematic diagram of an application push interface in some embodiments;
FIG. 4 is a schematic illustration of the determination of a target text element in some embodiments;
FIG. 5 is a flow diagram that illustrates a method for text processing in some embodiments;
FIG. 6 is a flow diagram that illustrates a method for text processing in some embodiments;
FIG. 7 is a schematic flow chart of obtaining an etched feature of a set of etched features in some embodiments;
FIG. 8 is a block diagram of a text processing apparatus in some embodiments;
FIG. 9 is a diagram of the internal structure of a computer device in some embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method provided by the embodiment of the application can relate to processing of big data, and for example, the method provided by the embodiment of the application can be applied to an application scene for processing package names of a large number of application programs. Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for technologies of big data, including a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.
According to the method provided by the embodiment of the application, after the initial relevant text corresponding to the target object is processed to obtain the voice relevant text, the semantic relevant text and the user portrait which are obtained through processing can be accurately recommended. For example, after the semantic related text is obtained, the object tag of the target object may be obtained according to the semantic related text, and the object is recommended to the user according to the object tag. As a practical example, if the target object is an application and the corresponding tag is a game class, the application may be pushed to the terminal of the user whose user portrays "favorite game".
According to the method provided by the embodiment of the application, when accurate recommendation is performed, recommendation can be performed based on an artificial intelligence model. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question answering, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
For example, after obtaining the semantic related text, an object tag of a target object may be obtained from the semantic related text, the object tag may be used as a feature of the object, a user figure may be used as a feature of the user, the feature of the user and the feature of the object may be input into a recommendation model trained in advance, the recommendation model may output a recommendation probability of recommending the object to the user, and if the recommendation probability is greater than the first probability, for example, 0.6, the object may be pushed to a terminal corresponding to the user. The pre-trained recommendation model is obtained by learning through a machine learning algorithm.
For another example, after the semantic related text is obtained, a vectorization representation of the target object may be obtained according to the semantic related text and a word embedding (word embedding) model, a vectorization representation of the user is obtained, a vector similarity between the vectorization representation of the target object and the vectorization representation of the user is calculated, an object with the vector similarity greater than a first similarity, for example, 0.8, is used as a push object of the user, and the push object is pushed to a terminal corresponding to the user, for example, an application program is recommended to the terminal corresponding to the user.
The text processing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 includes a target object, the server 104 may obtain an initial relevant text of each target object in the terminal, execute the method provided by the embodiment of the present application, obtain a semantic relevant text corresponding to the target object, and the server 104 may obtain a tag of the target object according to the semantic relevant text and push information of the object to the terminal 102 according to the tag of the target object.
For example, the terminal is installed with an application program, the server 104 may obtain a package name of each application program in the terminal 102, extract a semantic related text having semantics in the package name, obtain a tag of the application program according to the semantic related text, and push program related information to the terminal 102 according to the tag of the application program.
The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal 102 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
In some embodiments, as shown in fig. 2, a text processing method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step S202, an initial relevant text corresponding to the target object is obtained.
The target object may be anything, among others. For example, it may be any of a user, an application, video, voice, or image. Text is a representation of written language. The text includes language words and may also include at least one of numbers or punctuation. Language words refer to words expressed in language, for example, a language word may be a word expressed in english or chinese. The initial relevant text corresponding to the target object is a text relevant to the target object, and the relevant text needs text processing and is called as an initial relevant text. As a practical example, the target object may be an application and the initial relevant text may be a package name of the application, which may be "com. The package name is a unique identifier of an installation package of the application, and may be a unique identifier of an Android Application Package (APK), for example. The android system may index the management application with the package name. When an application is started, the name of the process corresponding to the application can be represented by a package name.
In some embodiments, the server may obtain the initial relevant text corresponding to the target object according to the tag determination request when receiving the tag determination request corresponding to the target object. For example, the tag determination request may carry a name of the target object and an initial relevant text corresponding to the target object. Or, the tag determination request may carry a range of an object whose tag needs to be determined, and an object in the range may be determined according to the range of the object, and the object is used as a target object, so as to obtain an initial relevant text corresponding to the target object. For example, the range of the object may include an identifier of a server, and the application program stored in the server corresponding to the identifier of the server may be used as a target object, and a package name of the installation package of the application program may be obtained as an initial relevant text corresponding to the application program. An application installed on a terminal, such as a mobile phone, may also be used as a target object, and a process name corresponding to the application may be obtained as a corresponding package name.
In some embodiments, the target object may be the object with the missing tag. An object with missing tags may refer to a target object with a number of tags less than or equal to a first number. The first number may be set as desired, and may be, for example, 1. For example, for an application, the name or description information of the application may be used as a label of the application, so at least one of the name or description information of the application may be obtained, and when it is determined that the name or description information of the application is missing, the package name of the application may be obtained as the initial relevant text corresponding to the application, so as to determine the label of the application by using the package name.
And step S204, obtaining the target corrosion structural element.
Text erosion refers to a process of eliminating elements located at the edge of a text, so that the text shrinks inwards. The edges of the text refer to the two ends of the text. When the text is etched, one end of the text may be etched or both ends may be etched.
An element is a constituent unit of text, the text may be divided into elements, and one element may include one or more characters. The division of elements in the text can be set according to needs, for example, one element can be a word, and one element can also be a paragraph. A structural element refers to a text element having a structure. For text, structure refers to the arrangement of characters in the text. The structural element may include at least two characters, and the arrangement between the at least two characters is certain, that is, the structural element formed by the characters may be regarded as a whole. For example, the structural element may be ". Com", and the sequence of the characters is ".", "c", "o", and "m", in this order.
The target corrosion structural element is used for text corrosion, and one target corrosion structural element may include one or more text elements. "plurality" means at least two. The target etching structural element may be stored in advance and may be set as needed. The erosion structure element is a text element with unclear semantics, and can be a text element without semantics or a general text element. A text element without semantics means that the text element is without semantics, for example the number "123". A generic text element means that the text element is generic, e.g. commonly occurring in the initial relevant text of the respective object. For example, assuming the target object is an application, the erosion structure element may be a structure text element that commonly appears in the package name of the respective application. Such as ". Com", ". Android", or ". Cn", etc.
Specifically, the server may store a set of corrosion structural elements for performing corrosion in advance, and obtain a target corrosion structural element from the set of corrosion structural elements.
In some embodiments, the server may obtain the importance of the text element, and use the text element satisfying the importance condition as the target erosion structure element, where the importance condition may include at least one of the importance being smaller than a preset importance or the importance ranking being lower than the preset ranking. The importance of a candidate structural element may be represented, for example, by a value of word frequency-inverse document frequency.
And step S206, acquiring the target text element in the initial relevant text according to the position of the corrosion reference point corresponding to the target corrosion structural element.
Specifically, the corrosion reference point is a point to be referred to in the corrosion treatment, and serves as a reference. For example, the erosion reference point may be an origin, and the text may be regarded as being composed of characters arranged at various positions, so that the origin corresponding to the text may be defined. The relative position relationship between the origin and the target corrosion structural element can be defined as required, and can be preset. For example, the location of the origin may be within the targeted corrosive structural element, before the targeted corrosive structural element, or after the targeted corrosive structural element. Before the target etching structural element means before the first character of the target etching structural element, and after the target etching structural element means after the last character of the target etching structural element.
The target text element is an element in the initial relevant text, and can be obtained from the text to be corroded. And when the initial corrosion is not carried out, the related text obtained by the last corrosion is taken as the related text to be corroded.
The target text element is obtained according to the position of the corrosion reference point, the position of the corrosion reference point in the relevant text to be corroded can be determined, and the text element corresponding to the position of the target corrosion structural element is determined according to the relative position relation between the target corrosion structural element and the corrosion reference point and serves as the target text element. For example, for the relevant text "com.android.bankabc" to be corroded, it is assumed that the position of the corrosion reference point is the character "a" in "android", the target corrosion structural element is "com. It can be obtained that "com." in the relevant text "com.android.bankabc" to be corroded corresponds to the position of the target corrosion structural element, i.e., both "com." and the target corrosion structural element "com." in the relevant text to be corroded are in front of the corrosion reference point. The "com." in "com.android.bankabc" can be used as the target text element.
In some embodiments, the target text element is an end text element. One text may have two ends, and an end text element refers to a text element located at the end, and may be at least one of a head text element or a tail text element. For example, for the text "com.
And S208, comparing the target corrosion structure element with the target text element to obtain a target comparison result.
Specifically, the target comparison result may be comparison agreement or comparison disagreement. For example, if the target erosion structural element is ". Com" and the target text element is ". Cm", the contrast is inconsistent. If the target corrosion structure element is ". Com" and the target text element is ". Com", the comparison is consistent.
And step S210, when the target comparison result is consistent, carrying out corrosion treatment on the initial relevant text to obtain a semantic relevant text corresponding to the target object.
Semantically related text refers to text related to semantics, which refers to implied meanings. The initial relevant text is considered to include semantically distinct relevant text and semantically unclear irrelevant text. The text corrosion aims at removing the semantic irrelevant text in the initial relevant text to obtain the semantic relevant text with semantics. For example, for the package name "com.wzry" of the application, after text processing, it can obtain "wzry", and "wzry" is a name of a game represented by semantically related text. "com" is a semantically unrelated text, since it occurs in many packet names, usually as a prefix or suffix of a packet name, and has no special meaning. It is understood that text elements, such as whether a word is semantically related text or semantically unrelated text, can differ depending on the application scenario. For example, when the target object is an application, since "android" is a general term, and represents an android-type application, and occurs in package names of many programs, the "android" can be used as a semantic irrelevant element. In other application scenarios, however, "android" may be a semantically related element.
And when the target comparison result is that the comparison is consistent, corroding the target text element, namely deleting the target text element from the related text to be corroded. If not, no etching is performed. It is to be understood that the etching treatment may be performed plural times, and each etching treatment may be performed by using the same target etching structural element or by using a different target etching structural element. And if the text is subjected to multiple corrosion treatments, the finally corroded related text is used as the semantic related text.
According to the text processing method, the target corrosion structure element can be obtained for the initial relevant text corresponding to the target object, the target text element in the initial relevant text is obtained according to the position of the corrosion reference point corresponding to the target corrosion structure element, the target corrosion structure element is compared with the target text element to obtain a target comparison result, and when the target comparison result is consistent, the initial relevant text is subjected to corrosion processing to obtain the semantic relevant text corresponding to the target object. During corrosion treatment, the target text element in the initial relevant text is obtained according to the position of the corrosion reference point corresponding to the target corrosion structure element, so that the text element to be corroded can be accurately obtained. By removing the text elements consistent with the corrosion structure elements, irrelevant texts in the initial relevant texts corresponding to the target objects can be removed, and semantic relevant texts with semantic value are obtained, so that the efficiency and the accuracy of text processing are improved.
In some embodiments, the text processing method further comprises: and acquiring target semantics corresponding to the semantics related text, and obtaining an object tag of the target object according to the target semantics.
In particular, object tags are used to describe objects. The object tag may be, for example, attribute information of the object, such as a category of the object. When the target object is an application program, the object tag corresponding to the target object may be a category of the application program, and belongs to a game category or a learning category. When the target object is a video, the object tag corresponding to the target object may be a name of a person in the video. The corresponding relationship between the semantic related text and the target semantic may be preset, or the semantic related text may be translated, for example, the semantic related text is input into a translation model and translated to obtain the target semantic. For example, the target semantic meaning "baby" of chinese can be obtained by inputting the english semantic related text "baby" into the translation model.
As an actual example, as shown in table one, a semantic related text (a corrosion result) is obtained after a package name (an initial related text) of an application program is corroded, a target semantic is obtained according to the semantic related text, and a corresponding relation of a result of a label of the application program is obtained according to the target semantic.
Watch 1
Application package name Result of corrosion Object semantics Application label
com.android.bankabc bankabc Agricultural bank Bank category
com.android.baby baby (Baby) Infant class
com.tudou.android tudou Potato: video class
In some embodiments, the target object includes a target application, the object tag corresponding to the target object includes a program tag corresponding to the target application, and the text processing method further includes:
taking the application program corresponding to the program label as a push application program; and determining a target terminal provided with a target application program, and pushing program related information corresponding to the pushed application program to the target terminal.
Specifically, the corresponding relationship between the semantic related text and the program tag may be preset, and the corresponding relationship between the semantic related text and the program tag may be obtained by semantic mining the program tag by the server, or may be manually set. The program tags may be, for example, entertainment, games, or english, etc. The application program corresponding to the program tag means that the application programs have the same program tag. The program-related information corresponding to the push application is information related to the push application. For example, may be at least one of a download link, introduction, or name of the push application. By pushing the program related information corresponding to the pushing application program to the target terminal provided with the target application program, the pushing accuracy can be improved, and the waste of pushing resources is reduced. For example, if the terminal of the user 1 is provided with the application a, and the tag of the application a is a game type, it is possible to acquire applications of the game type, such as the applications B and C. As shown in fig. 3, the download link and the name corresponding to each of the applications B and C may be pushed to the terminal of the user 1, and when the terminal receives a touch operation, such as a click operation, for the "download" control corresponding to the application B, the terminal may download the installation package of the application B according to the download link of the application B.
In some embodiments, the step S206 of obtaining the target text element in the initial relevant text according to the position of the erosion reference point corresponding to the target erosion structure element includes: determining a current character in a related text to be corroded, and taking the position of the current character as the position of a corrosion reference point; and acquiring a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element according to the relative position relation between the target corrosion structural element and the corrosion reference point.
The current character may be determined according to a preset character determination rule, and the character determination rule may be, for example, an arrangement order of characters, or may be a random selection of characters in a related text to be corroded as the current character. For example, the characters in the relevant text to be corroded may be sequentially used as the current characters according to the arrangement order of the characters in the relevant text to be corroded until the characters in the relevant text to be corroded are all used as the current characters, or when the target comparison result is consistent.
For example, for the text "com.android.bankabc" to be corroded, the position of the first character "c" in the text to be corroded may be used as the position of the corrosion reference point, and if the obtained target text element is empty and is not consistent with the target corrosion structural element (com.), the corrosion is not performed. And then, taking the position of the second character "o" as the position of the corrosion reference point, and if the obtained target text element is "c" and is inconsistent with the target corrosion structural element (com.), not corroding the target text element. And then, taking the position of the third character m as the position of the corrosion reference point, and if the obtained target text element is co and is inconsistent with the target corrosion structural element (com.), not corroding the target text element. And then, taking the position of the fourth character as the position of the corrosion reference point, and if the obtained target text element is com and is inconsistent with the target corrosion structural element, not corroding the target text element. And then, taking the position of the fifth character "a" as the position of a corrosion reference point, and corroding the text to be corroded if the obtained target text element is (com.) and is consistent with the target corrosion structural element. The step of determining the current character in the relevant text to be corroded and taking the position of the current character as the position of the corrosion reference point can also be stopped.
The relative positional relationship is that the corrosion reference point is within the target corrosion structural element, the corrosion reference point is behind the target corrosion structural element, or the relative positional relationship is that the corrosion reference point is in front of the target corrosion structural element. For example, assuming that the corrosion reference point is denoted by "+", and the target corrosion structural element is denoted by "abc", when the relative positional relationship of the target corrosion structural element to the corrosion reference point is denoted by "abc" +, it is indicated that the corrosion reference point is located behind the target corrosion structural element. When the relative positional relationship of the target corrosion structural element and the corrosion reference point is denoted by "abc", it indicates that the corrosion reference point is located in front of the target corrosion structural element. When the relative position relationship between the target corrosion structural element and the corrosion reference point is expressed as "a bc" or "ab c", the surface corrosion reference point is located inside the target corrosion structural element. Positional correspondence means that the positions coincide.
The relevant text to be corroded refers to the text that needs to be corroded. The primary corrosion refers to the first corrosion. And when the text is corroded for the first time, taking the initial relevant text as the relevant text to be corroded. And when the text is not corroded for the first time, taking the related text obtained by the last corrosion as the related text to be corroded. For example, the first etching is performed on "com.android.bankabc", and if "android.bankabc" is obtained, the second etching is performed with "android.bankabc" as the text to be etched.
In the embodiment of the application, the current character in the relevant text to be corroded is determined, the position of the current character is used as the position of the corrosion reference point, and the text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded is obtained according to the relative position relation between the target corrosion structural element and the corrosion reference point and is used as the target text element. And comparing the target corrosion structure element with the target text element to obtain a target comparison result. Therefore, under the condition that element division is not needed to be carried out on the related text to be corroded, the target text element matched with the target corrosion structure element is obtained, and the target text element is corroded.
In some embodiments, when the relative position relationship is that the corrosion reference point is behind the target corrosion structural element, a head text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded is obtained as the target text element.
Wherein the head text element refers to an element located at the head of the relevant text to be eroded. I.e., the head text element as a whole, is at the very front of the associated text to be eroded. For example, "com.android.bankabc" is a header text element, "com. But "android" is not a header text element because it is preceded by "com.
Specifically, when the relative positional relationship is that the etching reference point is behind the target etching structural element, an element that is in line with the position of the target etching structural element and is located at the head in the relevant text to be etched is acquired. When the corrosion is to corrode two ends of the text, the element which is consistent with the position of the target corrosion structural element but not located at the head of the related text to be corroded can not be taken as the target text element.
As a practical example, it is assumed that the relevant text to be corroded is "com. When the position of the corrosion reference point is determined for the first time, the first character "c" in "com.android.bankabc" is taken as the current character, the position of the first character "c" is taken as the position where the origin (, is located), then as shown in fig. 4, since the target corrosion structure element is "com." and has 4 characters, the positions of the 4 characters can be represented by 4 squares, and at this time, no text exists in the 4 squares, the target text element is blank 8230, when the position of the corrosion reference point is determined for the fifth time, the position of the fifth character "a" at the front end in "com.android.bankabc" is taken as the position where the corrosion reference point (,) is located, and at this time, the text "com." exists in the 4 squares, so the target text element is "com." and thus the target text element is compared with the target corrosion structure element "com." and the comparison result is determined to be consistent, then the text element is compared with the corrosion structure element to be compared, and the relevant corrosion structure element can be removed from the corrosion structure to be used.
In the embodiment of the application, when the relative position relationship is that the corrosion reference point is behind the target corrosion structural element, the head text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded is obtained and used as the target text element, so that the corrosion of the text can be guaranteed to be conducted on the text positioned at the end part of the text, and the correctness of the text corrosion is guaranteed. In addition, the topological structure, namely the arrangement sequence among elements, in the relevant text to be corroded can not be changed, and the semantic correctness of the semanteme relevant text obtained by corrosion can be ensured because the topological structure of the text has influence on the semanteme represented by the text.
In some embodiments, when the relative position relationship is that the corrosion reference point is before the target corrosion structural element, a tail text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded is obtained as the target text element.
Wherein, the tail text element refers to an element positioned at the tail of the related text to be corroded. I.e. the tail text element as a whole, is at the very end of the relevant text to be eroded. For example, "com. But "android" is not a tail text element because it is also followed by ". Bankabc".
Specifically, when the relative positional relationship is such that the etching reference point is before the target etching structural element, the element which is in correspondence with the position of the target etching structural element and is located at the tail in the related text to be etched is acquired. And when the corrosion treatment is to corrode two ends of the text, regarding the elements which are consistent with the positions of the target corrosion structural elements but not positioned at the tail parts of the related text to be corroded, not regarding the elements as the target text elements.
In the embodiment of the application, when the relative position relationship is that the corrosion reference point is in front of the target corrosion structural element, the tail text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded is obtained and used as the target text element, so that when the relevant text to be corroded is corroded, the text element at the tail is corroded, that is, the text corrosion is guaranteed to be corroded for the text at the end of the text, and the text corrosion accuracy is guaranteed. And the topological structure, namely the arrangement sequence among elements in the related text to be corroded, can not be changed, and the semantic correctness of the semanteme related text obtained by corrosion can be ensured because the topological structure of the text has influence on the semanteme represented by the text.
In some embodiments, when the characters in the relevant text to be eroded are used as the current characters in sequence according to the arrangement order of the characters, the current characters may be from the head character or from the tail character. For example, when the corrosion reference point is behind the target corrosion structural element, the obtained target text element is a head text element, and therefore, starting from a head character, characters in the relevant text to be corroded are sequentially used as current characters, so that the speed of obtaining the target text element which is in contrast with the target corrosion structural element is improved. Similarly, in order to increase the speed of obtaining the target text element which is consistent with the target corrosion structural element in comparison, when the corrosion reference point is in front of the target corrosion structural element, the characters in the relevant text to be corroded can be sequentially used as the current characters from the characters at the tail part.
In some embodiments, when the relative position relationship is that the erosion reference point is within the target erosion structure element, as shown in fig. 5, obtaining a text element corresponding to the position of the target erosion structure element in the relevant text to be eroded according to the relative position relationship between the target erosion structure element and the erosion reference point, and as the target text element, including:
step S206A, using the text element which is positioned in front of the corrosion reference point in the target corrosion structure element as a first structure text element, and using the text element which is positioned in back of the corrosion reference point in the target corrosion structure element as a second structure text element.
Specifically, assuming that the target corrosion structural element is "com.. Test", the corrosion reference point is located between "com." and "test", denoted as "com.. Test". Then "com." may be taken as the first structural text element. Let ". Test" be the second structural text element.
Step S206B, acquiring a head text element corresponding to the position of the first structural text element and a tail text element corresponding to the position of the second structural text element from the related text to be corroded, and taking the head text element and the tail text element as target text elements.
For example, the relevant text to be corroded is "bba. Ten. Mobile. Test", "com. Test "as the second structural text element. Then when the position of the erosion reference point is located at "t" in "ten", it is obtained that the head text element corresponding to the position of the first structural text element is "bba. Then when the position of the erosion reference point is located at "e" in "mobile", it is acquired that the tail text element corresponding to the position of the second structural text element is ". Test".
As shown in fig. 5, the step S208 of comparing the target corrosion structure element with the target text element to obtain a target comparison result includes:
step S208A, comparing the head text element with the first structure text element to obtain a first comparison result, and comparing the tail text element with the second structure text element to obtain a second comparison result.
Specifically, the first comparison result may be contrast coincidence or contrast inconsistency. The second comparison result may be contrast consistent or contrast inconsistent.
And step S208B, obtaining a target comparison result according to the first comparison result and the second comparison result.
Specifically, when both the first comparison result and the second comparison result are consistent in contrast, the target comparison result is consistent in contrast. And when at least one of the first comparison result or the second comparison result is inconsistent in contrast, the target comparison result is inconsistent in contrast.
And when the target comparison result is that the comparison is consistent, corroding the head text element and the tail text element in the related text to be corroded. Otherwise, the head text element and the tail text element in the related text to be corroded are not corroded. The head text element and the tail text element in the relevant text to be corroded are corroded by the target comparison result which is consistent in comparison, so that the irrelevant text elements with the head and the tail combined in a certain structure can be corroded, and the corrosion efficiency is improved.
In some embodiments, as shown in fig. 6, the step S204 of obtaining the target erosion structural element includes: and obtaining the target corrosion structural element from the corrosion structural element set.
Specifically, a plurality of corrosion structural elements may be included in the corrosion structural element set. The etching structural elements may be obtained from the etching structural element set in a preset order or randomly as the target etching structural elements. For example, as shown in table two, the corrosion structural elements in some embodiments. Where "+" in table two indicates a corrosion reference point. At the time of the first etching, ". Com" may be taken as a target etching structural element, and at the time of the second etching, ". Android" may be taken as a target etching structural element.
As shown in fig. 6, the step S210 of performing erosion processing on the initial relevant text according to the target comparison result to obtain the semantic relevant text corresponding to the target object includes:
step S602, carrying out corrosion treatment on the relevant text to be corroded according to the target comparison result, so as to obtain the current relevant text; and when the initial corrosion is carried out, taking the initial relevant text as the relevant text to be corroded.
Specifically, the current relevant text refers to the relevant text obtained by currently performing corrosion processing. For example, as shown in table three, the first etching is performed on the application package name to obtain the etching result using different etching structure elements. Then, for the initial relevant text "com.
Table two: and corroding the structural elements of the structural element set.
*.com com.*
*.android android.*
*.cn cn.*
*.app app.*
*.net net.*
Table three:
application package name Targeted corrosion of structural elements Result of corrosion
com.android.bankabc com.* android.bankabc
com.android.baby *.android com.android.baby
com.tudou.android *.android com.tudou
com.tudou.android cn.* com.tudou.android
Step S604, determine whether the current relevant text converges.
Specifically, convergence means that the current relevant text does not change relative to the target historical relevant text. If the text is converged, the corrosion is not performed, and the step S608 is performed to use the converged current relevant text as the semantic relevant text. If not, the step S606 is entered to use the current relevant text as the relevant text to be corroded, so as to continue to corrode.
The historical related text refers to related text obtained by corroding the initial related text before the current related text. For example, relative to the current relevant text obtained by the third corrosion processing, the relevant text obtained by the first corrosion processing and the relevant text obtained by the second corrosion processing are history relevant texts. The related text obtained by the last corrosion can be used as the target history related text, and the history related text obtained by the last corrosion can also be used as the target history related text.
In some embodiments, the step of determining that the current relevant text converges comprises: in the current round of corrosion, a related text obtained by corrosion treatment by using the last corrosion structural element in the corrosion structural element set is used as the related text obtained by the current round of corrosion; and comparing the related text obtained by the corrosion of the current round with the related text obtained by the corrosion of the previous round, and determining that the current related text is converged when the comparison is consistent.
Specifically, each corrosion structure element in the corrosion structure element set is sequentially used as a target corrosion structure element, and the corrosion treatment is performed on the relevant text to be corroded, which is called a round of corrosion. Multiple rounds of erosion may be performed on the initially relevant text. The step of determining whether the current relevant text converges may be performed after each round of erosion. For example, after the first round of corrosion is completed, whether the finally obtained current relevant text is the same as the initial relevant text in the first round of corrosion is judged, if yes, the text is converged, and the finally obtained current relevant text in the first round of corrosion is used as the semantic relevant text. And if not, performing the second round of corrosion on the current relevant text finally obtained in the first round of corrosion as the relevant text to be corroded, and comparing the current relevant text finally obtained in the second round of corrosion with the relevant text finally obtained in the first round of corrosion. If the comparison is consistent, the current relevant text is converged, and if the comparison is inconsistent, the current relevant text is not converged, and the third round of corrosion is continued.
And step S606, taking the current relevant text as the relevant text to be corroded.
Specifically, the current relevant text is taken as the relevant text to be corroded, and the process continues to step S204, that is, the step of obtaining the target corrosion structural element from the corrosion structural element set can obtain another corrosion structural element, and corrodes the relevant text to be corroded.
Step S608, the converged current relevant text is taken as a semantic relevant text.
Specifically, if the current relevant text has converged, the converged current relevant text is taken as the semantically relevant text.
In the embodiment of the application, the corrosion structural elements in the corrosion structural element set are used for corroding the initial relevant texts, when the current relevant texts are converged, the corrosion is stopped, and the semanteme irrelevant texts at the two ends of the initial relevant texts can be completely removed.
In some embodiments, the set of erosion structure elements is a structure element that satisfies an importance condition, the importance condition including at least one of an importance being less than a preset importance or an importance ranking being lower than the preset ranking. In the following, taking the target corrosion structural element as an example, how to obtain the corrosion structural element of the corrosion structural element set is described, as shown in fig. 7, the method includes the following steps:
step S702, a relevant text set is obtained.
Specifically, the relevant text set includes a plurality of relevant texts, which may be set as needed. The relevant text in the relevant text set may be the initial relevant text corresponding to the reference object. The reference object and the target object have the same object type, for example, the target object is application 1, and the related text in the related text set may be package names corresponding to 1000 applications.
In some embodiments, the target object comprises a target application, and obtaining the set of relevant text comprises: and acquiring application program package names respectively corresponding to the application programs in the application program set to form a related text set. The application program set includes a plurality of application programs, for example, all application programs in an application market may be obtained, and package names corresponding to the application programs are obtained to form a relevant text set.
In some embodiments, obtaining the initial relevant text corresponding to the target object includes: and taking the application program of the tag to be determined as a target application program, and acquiring an application program package name corresponding to the target application program as an initial related text corresponding to the target application program.
Specifically, the application program lacking the tag may be the application program to be tagged, or the application program lacking the name may be referred to as the application program to be tagged. The target application may be obtained from a set of applications, for example, each application in the set of applications may be an application for which a tag is to be determined.
Step S704, obtaining text elements in the relevant text set to form a candidate structural element set.
Specifically, the relevant texts in the relevant text set may be subjected to element division to obtain text elements. The set of candidate structuring elements comprises a plurality of candidate structuring elements.
For example, the relevant text set is called a corpus D, and each corpus in the corpus D may be participled to obtain text elements, which form a candidate structural element set. For example, for a packet name, element division may be performed according to punctuation marks, and a character string between the punctuation marks is taken as one text element. As a practical example, the package name "com.wzrytp.desu" is divided into three text elements, i.e., "com", "wzrytp", and "desu".
Step S706, the importance of each candidate structural element in the relevant text set in the candidate structural element set is obtained.
Specifically, the importance level is used to indicate an importance level, and the greater the importance level, the more important the indication. The importance of the candidate structural element may be determined according to the number of occurrences of the relevant text of the candidate structural element in the relevant text set and the number of relevant texts including the candidate structural element in the relevant text set. The importance of the candidate structural element is in positive correlation with the occurrence frequency, and in negative correlation with the number of the relevant texts including the candidate structural element in the relevant text set. That is, the more occurrences of the candidate structural element in the relevant text, the greater the importance, under certain other factors. Under the condition that other factors are certain, the larger the number of texts of the related texts including the candidate structural element in the related text set is, the smaller the importance degree is. For example, the importance may be the number of occurrences times the inverse of the number of relevant texts including the candidate structural element.
In some embodiments, the importance of the candidate structural element may be obtained according to the word frequency and the inverse document frequency. For example, the importance may be a product of Term Frequency (TF) and Inverse Document Frequency (IDF). The inverse document frequency can be obtained according to the total number of texts in the relevant text set and the number of relevant texts comprising the candidate structural element. The more the number of the related texts including the candidate structural element is, the more the word is indicated to be a general word, so the less the importance is. The word frequency refers to the frequency of the occurrence of the words in the text, and can be obtained by dividing the occurrence frequency of the words by the total number of the words in the text, wherein the greater the word frequency, the more important the candidate structural element is in the related text.
For example, the importance may be expressed by a word frequency-inverse document frequency, as expressed in equations (1), (2), and (3). tf _ idf i,j Representing the importance of the jth candidate structural element in the ith associated text, tf i,j And indicating the word frequency of the jth candidate structural element in the ith relevant text. idf i,j And representing the inverse document frequency corresponding to the jth candidate structural element in the ith related text. n is i,j Indicating the number of occurrences of the jth candidate structural element in the ith associated text. N denotes the number of text elements in the ith text. | D | represents the total number of relevant texts in the relevant text set. p is a radical of i K may be any number, and may be 1, for example, to indicate the number of texts including the jth candidate structural element in the relevant text set.
tf_idf i,j =tf i,j *idf i,j (1)
Figure BDA0002449956010000211
Figure BDA0002449956010000212
Step S708, according to the importance of the candidate structural elements, screening the candidate structural elements meeting the importance condition from the candidate structural element set to form an intermediate structural element set, wherein the intermediate structural element set is used for determining the target corrosion structural elements.
Specifically, the importance condition includes at least one of the importance being less than a first importance or the importance ranking being lower than the first ranking. The first importance may be preset as needed, and may be, for example, 0.5. The first order may also be preset as desired, for example, below the 50 th last name. The importance degree is sorted from big to small, and the higher the importance degree is, the higher the sorting is. All structural elements in the intermediate structural element set can be used as target corrosion structural elements, or the target corrosion structural elements can be obtained after manual selection. For example, the 50 character strings with the lowest importance may be acquired and output to the terminal for display. The user can make personalized selection according to the requirement, such as the distribution and semantics of the elements in the intermediate structure element set. The terminal receives the selection operation of the user on the 50 character strings, the terminal returns the character strings selected by the user to the server, and the server takes the character strings selected by the user as corrosion structural elements in the corrosion structural element set.
For example, for the package name of the application, com, android, cn, app, net may be used as the corrosion structural element. In the embodiment of the application, the candidate structural elements with the importance degree lower than the first importance degree or the importance degree lower than the first order are combined into the intermediate structural element set to be used for determining the target corrosion structural element, so that when the target corrosion structural element is used for corroding the initial relevant text process, unimportant texts are corroded, namely, semantic irrelevant texts are corroded, and the relevant texts finally obtained by corrosion are semantic relevant texts with high importance degrees.
The method provided by the embodiment of the application can be applied to an application scene of determining the label of the application program, the key information in the text can be extracted through the corrosion operation of the text, and the recall information amount is increased. And the label of the application program can be obtained according to the key information, and the program recommendation or classification management can be carried out according to the label of the application program.
According to the text processing method provided by the embodiment of the application, under the condition that the installation package name of the application program is acquired, valuable information in the package name can be extracted to label the application program, so that the data value is improved, and information push is performed.
The following description is given of the text processing method provided in the embodiment of the present application, taking a target object as a target application as an example, and includes the following steps:
1. and acquiring a related text set.
For example, package names of 1000 applications may be obtained, each package name being used as related text, and a related text set composed of 1000 related texts is formed.
2. And acquiring text elements in the related text set to form a candidate structural element set.
For example, the packet names may be divided according to punctuation marks to obtain text elements, and a candidate structural element set is formed.
3. And acquiring the importance of each candidate structural element in the candidate structural element set in the relevant text set.
For example, the tf-idf value of each candidate structural element may be calculated as the importance.
4. And screening candidate structural elements meeting the importance degree condition from the candidate structural element set according to the importance degree of the candidate structural elements to form a corrosion structural element set.
For example, the importance condition includes at least one of an importance being less than a first importance or an importance ranking being lower than the first ranking. The 10 candidate structural elements with the lowest importance can be obtained to form a corrosion structural element set.
5. And acquiring an initial related text corresponding to the target object.
For example, each of the 1000 applications in step 1 may be a target application, and the package name of the target application may be acquired.
6. And obtaining the target corrosion structural element from the corrosion structural element set.
Specifically, the etching structural elements in the etching structural element set may be sequentially set as the target etching structural elements.
7. And acquiring the target text element in the initial relevant text according to the position of the corrosion reference point corresponding to the target corrosion structural element.
And when the text is not subjected to primary corrosion, the related text obtained by the last corrosion is taken as the related text to be corroded.
8. And comparing the target corrosion structure element with the target text element to obtain a target comparison result.
9. And carrying out corrosion treatment on the relevant text to be corroded according to the target comparison result, so as to obtain the current relevant text.
Specifically, steps 6 to 9 may be repeatedly performed until all the corrosion structural elements in the corrosion structural element set are used as target corrosion structural elements, and the text to be corroded is subjected to corrosion treatment, which is called a round of corrosion.
10. And in the current round of corrosion, the current relevant text obtained by carrying out corrosion treatment on the last corrosion structural element in the corrosion structural element set is used as the relevant text obtained by the current round of corrosion.
11. And comparing the related text obtained by the current round of corrosion with the related text obtained by the previous round of corrosion, and judging whether the current related text obtained by the current round of corrosion is converged according to the comparison result.
Wherein, steps 10 and 11 can be executed after each round of corrosion is finished. If not, the step 6 is returned.
12. And taking the converged current relevant text as semantic relevant text.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above flowcharts may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In some embodiments, as shown in fig. 8, there is provided a text processing apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: an initial relevant text obtaining module 802, a target corrosion structure element obtaining module 804, a target text element obtaining module 806, a comparison module 808, and a corrosion module 810, wherein:
an initial relevant text obtaining module 802, configured to obtain an initial relevant text corresponding to the target object.
A target corrosion structural element obtaining module 804, configured to obtain a target corrosion structural element.
A target text element obtaining module 806, configured to obtain a target text element in the initial relevant text according to a position of the corrosion reference point corresponding to the target corrosion structural element.
And the comparison module 808 is configured to compare the target corrosion structure element with the target text element to obtain a target comparison result.
And the corrosion module 810 is configured to, when the target comparison result is that the comparison is consistent, perform corrosion processing on the initial relevant text to obtain a semantic relevant text corresponding to the target object.
In some embodiments, the target text element acquisition module comprises: the current character determining unit is used for determining a current character in the relevant text to be corroded and taking the position of the current character as the position of a corrosion reference point corresponding to the target corrosion structural element; when the initial corrosion is carried out, the initial relevant text is taken as the relevant text to be corroded, and when the initial corrosion is not carried out, the relevant text obtained by the last corrosion is taken as the relevant text to be corroded; and the target text element acquisition unit is used for acquiring a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element according to the relative position relation between the target corrosion structural element and the corrosion reference point.
In some embodiments, the target text element obtaining unit is to: and when the relative position relation is that the corrosion reference point is behind the target corrosion structural element, acquiring a head text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
In some embodiments, the target text element obtaining unit is to: and when the relative position relation is that the corrosion reference point is in front of the target corrosion structural element, acquiring a tail text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
In some embodiments, the target text element obtaining unit is to: when the relative position relation is that the corrosion reference point is in the target corrosion structure element, taking a text element positioned in front of the corrosion reference point in the target corrosion structure element as a first structure text element, and taking a text element positioned behind the corrosion reference point in the target corrosion structure element as a second structure text element; acquiring a head text element corresponding to the position of the first structural text element and a tail text element corresponding to the position of the second structural text element from the related text to be corroded, and taking the head text element and the tail text element as target text elements; the comparison module is used for: comparing the head text element with the first structural text element to obtain a first comparison result, and comparing the tail text element with the second structural text element to obtain a second comparison result; and obtaining a target comparison result according to the first comparison result and the second comparison result.
In some embodiments, the targeted corrosion structural element acquisition module is to: obtaining a target corrosion structural element from the corrosion structural element set; the corrosion module is used for: when the target comparison result is that the comparison is consistent, carrying out corrosion treatment on the relevant text to be corroded, and obtaining the current relevant text; wherein, during the primary corrosion, the initial relevant text is taken as the relevant text to be corroded; and taking the current relevant text as the relevant text to be corroded, entering the step of acquiring the target corrosion structural element from the corrosion structural element set until the current relevant text is converged, and taking the converged current relevant text as the semantic relevant text corresponding to the target object.
In some embodiments, the module for determining the current relevant text convergence is for: in the current round of corrosion, a related text obtained by corrosion treatment by using the last corrosion structural element in the corrosion structural element set is used as the related text obtained by the current round of corrosion; and comparing the related text obtained by the corrosion of the current round with the related text obtained by the corrosion of the previous round, and determining that the current related text is converged when the comparison is consistent.
In some embodiments, the text processing apparatus further comprises: a relevant text set acquisition module for acquiring a relevant text set; the candidate structural element set acquisition module is used for acquiring text elements in the related text set to form a candidate structural element set; the importance acquisition module is used for acquiring the importance of each candidate structural element in the candidate structural element set in the relevant text set; the screening module is used for screening candidate structural elements meeting the importance degree condition from the candidate structural element set according to the importance degree of the candidate structural elements to form an intermediate structural element set, and the intermediate structural element set is used for determining target corrosion structural elements; the importance condition includes at least one of an importance being less than a first importance or an importance ranking being lower than the first ranking.
In some embodiments, the target object comprises a target application, and the relevant text collection acquisition module is configured to: acquiring application program package names respectively corresponding to application programs in an application program set to form a related text set; the initial relevant text acquisition module is used for: and taking the application program of the tag to be determined as a target application program, and acquiring an application program package name corresponding to the target application program as an initial related text corresponding to the target application program.
In some embodiments, the text processing apparatus further comprises: and the object label acquisition module is used for acquiring the target semantics corresponding to the semantic related text and acquiring the object label of the target object according to the target semantics.
In some embodiments, the target object includes a target application, the object tag corresponding to the target object includes a program tag corresponding to the target application, and the apparatus further includes: the push application program determining module is used for taking the application program corresponding to the program label as a push application program; and the pushing module is used for determining a target terminal provided with a target application program and pushing program related information corresponding to the pushed application program to the target terminal.
For the specific definition of the text processing device, the above definition of the text processing method can be referred to, and is not described herein again. The respective modules in the text processing apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used to store the relevant text collections. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a text processing method.
It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In some embodiments, there is further provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.
In some embodiments, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (13)

1. A method of text processing, the method comprising:
acquiring an initial related text corresponding to a target object;
obtaining target corrosion structural elements;
determining a current character in a related text to be corroded, and taking the position of the current character as the position of a corrosion reference point corresponding to the target corrosion structural element; when the initial relevant texts are not subjected to the primary corrosion, the relevant texts obtained by the last corrosion are taken as the relevant texts to be corroded;
acquiring a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded according to the relative position relation between the target corrosion structural element and the corrosion reference point, and taking the text element as a target text element in the initial relevant text;
comparing the target corrosion structure element with the target text element to obtain a target comparison result;
when the target comparison result is consistent, performing corrosion processing on the initial relevant text to obtain a semantic relevant text corresponding to the target object, including: and when the target comparison result is consistent, performing corrosion treatment on the target text element, namely deleting the target text element from the related text to be corroded to obtain a semantic related text corresponding to the target object.
2. The method according to claim 1, wherein the obtaining, according to the relative position relationship between the target corrosion structural element and the corrosion reference point, a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element in the initial relevant text comprises:
and when the relative position relation is that the corrosion reference point is behind the target corrosion structural element, acquiring a head text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
3. The method according to claim 1, wherein the obtaining, according to the relative positional relationship between the target erosion structural element and the erosion reference point, a text element corresponding to the position of the target erosion structural element in the relevant text to be eroded as a target text element in the initial relevant text comprises:
and when the relative position relationship is that the corrosion reference point is in front of the target corrosion structural element, acquiring a tail text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded as the target text element.
4. The method according to claim 1, wherein the obtaining, according to the relative positional relationship between the target erosion structural element and the erosion reference point, a text element corresponding to the position of the target erosion structural element in the relevant text to be eroded as a target text element in the initial relevant text comprises:
when the relative position relation is that the corrosion reference point is in the target corrosion structural element, taking a text element positioned in front of the corrosion reference point in the target corrosion structural element as a first structural text element, and taking a text element positioned in back of the corrosion reference point in the target corrosion structural element as a second structural text element;
acquiring a head text element corresponding to the position of the first structural text element and a tail text element corresponding to the position of the second structural text element from the to-be-corroded related text, and taking the head text element and the tail text element as the target text elements;
the step of comparing the target corrosion structure element with the target text element to obtain a target comparison result comprises the following steps:
comparing the head text element with the first structure text element to obtain a first comparison result, and comparing the tail text element with the second structure text element to obtain a second comparison result;
and obtaining the target comparison result according to the first comparison result and the second comparison result.
5. The method of claim 1, wherein said obtaining a target corrosion structural element comprises:
obtaining target corrosion structural elements from the corrosion structural element set;
when the target comparison result is consistent, performing corrosion processing on the initial relevant text to obtain a semantic relevant text corresponding to the target object comprises:
when the target comparison result is consistent, carrying out corrosion treatment on the relevant text to be corroded, and obtaining the current relevant text; when the initial relevant texts are corroded for the first time, the initial relevant texts are used as the relevant texts to be corroded;
and taking the current relevant text as the relevant text to be corroded, entering the step of acquiring the target corrosion structural element from the corrosion structural element set until the current relevant text is converged, and taking the converged current relevant text as the semantic relevant text corresponding to the target object.
6. The method of claim 5, wherein determining that the current relevant text is converged comprises:
in the current round of corrosion, a relevant text obtained by corrosion treatment by using the last corrosion structural element in the corrosion structural element set is used as a relevant text obtained by the current round of corrosion;
and comparing the related text obtained by the current round of corrosion with the related text obtained by the previous round of corrosion, and determining that the current related text is converged when the comparison is consistent.
7. The method of claim 1, further comprising:
acquiring a related text set;
acquiring text elements in the related text set to form a candidate structural element set;
acquiring the importance of each candidate structural element in the relevant text set in the candidate structural element set;
screening candidate structural elements meeting the importance degree condition from the candidate structural element set according to the importance degree of the candidate structural elements to form an intermediate structural element set, wherein the intermediate structural element set is used for determining the target corrosion structural elements;
the importance condition includes at least one of an importance being less than a first importance or an importance ranking being lower than the first ranking.
8. The method of claim 7, wherein the target object comprises a target application, and wherein obtaining the set of relevant text comprises:
acquiring application program package names respectively corresponding to application programs in an application program set to form a related text set;
the acquiring of the initial relevant text corresponding to the target object includes:
and taking the application program of the tag to be determined as a target application program, and acquiring an application program package name corresponding to the target application program as an initial related text corresponding to the target application program.
9. The method of claim 1, further comprising:
and acquiring target semantics corresponding to the semantics related text, and acquiring an object tag of the target object according to the target semantics.
10. The method of claim 9, wherein the target object comprises a target application, wherein the object tag corresponding to the target object comprises a program tag corresponding to the target application, and wherein the method further comprises:
acquiring an application program corresponding to the program label as a push application program;
and determining a target terminal provided with the target application program, and pushing program related information corresponding to the pushed application program to the target terminal.
11. A text processing apparatus, the apparatus comprising:
the initial relevant text acquisition module is used for acquiring an initial relevant text corresponding to the target object;
the target corrosion structural element acquisition module is used for acquiring a target corrosion structural element;
the target text element acquisition module is used for determining a current character in a related text to be corroded and taking the position of the current character as the position of a corrosion reference point corresponding to the target corrosion structural element; when the initial relevant texts are not subjected to the primary corrosion, the relevant texts obtained by the last corrosion are taken as the relevant texts to be corroded; acquiring a text element corresponding to the position of the target corrosion structural element in the relevant text to be corroded according to the relative position relation between the target corrosion structural element and the corrosion reference point, and taking the text element as a target text element in the initial relevant text;
the comparison module is used for comparing the target corrosion structure element with the target text element to obtain a target comparison result;
the corrosion module is used for performing corrosion processing on the initial relevant text when the target comparison result is consistent in comparison to obtain a semantic relevant text corresponding to the target object, and the corrosion module comprises: and when the target comparison result is consistent, performing corrosion treatment on the target text element, namely deleting the target text element from the relevant text to be corroded to obtain the semantic relevant text corresponding to the target object.
12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.
13. A computer-readable storage medium, storing a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
CN202010289730.3A 2020-04-14 2020-04-14 Text processing method and device, computer equipment and storage medium Active CN111476037B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010289730.3A CN111476037B (en) 2020-04-14 2020-04-14 Text processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010289730.3A CN111476037B (en) 2020-04-14 2020-04-14 Text processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111476037A CN111476037A (en) 2020-07-31
CN111476037B true CN111476037B (en) 2023-03-31

Family

ID=71752122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010289730.3A Active CN111476037B (en) 2020-04-14 2020-04-14 Text processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111476037B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004078869A (en) * 2002-08-20 2004-03-11 Joho Bunka Sogo Kenkyusho:Kk Computer program for extracting keyword from sentence described in japanese ,and computer readable recording medium recorded with the computer program
CN101030221A (en) * 2007-04-13 2007-09-05 清华大学 Large-scale and multi-key word matching method for text or network content analysis
CN101369278A (en) * 2008-09-27 2009-02-18 成都市华为赛门铁克科技有限公司 Approximate adaptation method and apparatus
JP2010102371A (en) * 2008-10-21 2010-05-06 Nippon Telegr & Teleph Corp <Ntt> Emoticon detecting device, emoticon detection method, program and recording medium
CN104750673A (en) * 2013-12-31 2015-07-01 中国移动通信集团公司 Text matching and filtering method and text matching and filtering device
CN105068989A (en) * 2015-07-23 2015-11-18 中国测绘科学研究院 Place name and address extraction method and apparatus
CN105677757A (en) * 2015-12-30 2016-06-15 东北大学 Big data similarity join method based on prefix-affix filtering
CN105843950A (en) * 2016-04-12 2016-08-10 乐视控股(北京)有限公司 Sensitive word filtering method and device
CN108733828A (en) * 2018-05-24 2018-11-02 北京金堤科技有限公司 Extracting method, device and the computer-readable medium of Business Name
CN108920483A (en) * 2018-04-28 2018-11-30 南京搜文信息技术有限公司 Character string fast matching method based on Suffix array clustering
CN109101491A (en) * 2018-07-24 2018-12-28 湖南星汉数智科技有限公司 A kind of author information abstracting method, device, computer installation and computer readable storage medium
CN110008474A (en) * 2019-04-04 2019-07-12 科大讯飞股份有限公司 A kind of key phrase determines method, apparatus, equipment and storage medium
CN110110198A (en) * 2017-12-28 2019-08-09 中移(苏州)软件技术有限公司 A kind of method for abstracting web page information and device
CN110688841A (en) * 2019-09-30 2020-01-14 广州准星信息科技有限公司 Mechanism name identification method, mechanism name identification device, mechanism name identification equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7956844B2 (en) * 2006-04-07 2011-06-07 Research In Motion Limited Handheld electronic device providing a learning function to facilitate correction of erroneous text entry in environment of text requiring multiple sequential actuations of the same key, and associated method
CN102053993B (en) * 2009-11-10 2014-04-09 阿里巴巴集团控股有限公司 Text filtering method and text filtering system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering
CN104615585B (en) * 2014-01-06 2017-07-21 腾讯科技(深圳)有限公司 Handle the method and device of text message

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004078869A (en) * 2002-08-20 2004-03-11 Joho Bunka Sogo Kenkyusho:Kk Computer program for extracting keyword from sentence described in japanese ,and computer readable recording medium recorded with the computer program
CN101030221A (en) * 2007-04-13 2007-09-05 清华大学 Large-scale and multi-key word matching method for text or network content analysis
CN101369278A (en) * 2008-09-27 2009-02-18 成都市华为赛门铁克科技有限公司 Approximate adaptation method and apparatus
JP2010102371A (en) * 2008-10-21 2010-05-06 Nippon Telegr & Teleph Corp <Ntt> Emoticon detecting device, emoticon detection method, program and recording medium
CN104750673A (en) * 2013-12-31 2015-07-01 中国移动通信集团公司 Text matching and filtering method and text matching and filtering device
CN105068989A (en) * 2015-07-23 2015-11-18 中国测绘科学研究院 Place name and address extraction method and apparatus
CN105677757A (en) * 2015-12-30 2016-06-15 东北大学 Big data similarity join method based on prefix-affix filtering
CN105843950A (en) * 2016-04-12 2016-08-10 乐视控股(北京)有限公司 Sensitive word filtering method and device
CN110110198A (en) * 2017-12-28 2019-08-09 中移(苏州)软件技术有限公司 A kind of method for abstracting web page information and device
CN108920483A (en) * 2018-04-28 2018-11-30 南京搜文信息技术有限公司 Character string fast matching method based on Suffix array clustering
CN108733828A (en) * 2018-05-24 2018-11-02 北京金堤科技有限公司 Extracting method, device and the computer-readable medium of Business Name
CN109101491A (en) * 2018-07-24 2018-12-28 湖南星汉数智科技有限公司 A kind of author information abstracting method, device, computer installation and computer readable storage medium
CN110008474A (en) * 2019-04-04 2019-07-12 科大讯飞股份有限公司 A kind of key phrase determines method, apparatus, equipment and storage medium
CN110688841A (en) * 2019-09-30 2020-01-14 广州准星信息科技有限公司 Mechanism name identification method, mechanism name identification device, mechanism name identification equipment and storage medium

Also Published As

Publication number Publication date
CN111476037A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN110162593B (en) Search result processing and similarity model training method and device
CN109918560B (en) Question and answer method and device based on search engine
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
US11775594B2 (en) Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111325030A (en) Text label construction method and device, computer equipment and storage medium
CN112115232A (en) Data error correction method and device and server
CN110929526A (en) Sample generation method and device and electronic equipment
CN111476037B (en) Text processing method and device, computer equipment and storage medium
CN111507098B (en) Ambiguous word recognition method and device, electronic equipment and computer-readable storage medium
CN115455169A (en) Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence
CN114398903A (en) Intention recognition method and device, electronic equipment and storage medium
CN103744830A (en) Semantic analysis based identification method of identity information in EXCEL document
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
CN113761192A (en) Text processing method, text processing device and text processing equipment
CN112269877A (en) Data labeling method and device
CN116401305A (en) Information processing method, device and system
CN110765239B (en) Hot word recognition method, device and storage medium
CN116992111B (en) Data processing method, device, electronic equipment and computer storage medium
CN113312523B (en) Dictionary generation and search keyword recommendation method and device and server
CN113268566B (en) Question and answer pair quality evaluation method, device, equipment and storage medium
CN117034928A (en) Model construction method, device, equipment and storage medium
CN116431774A (en) Question answering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40025845

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant