Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary architecture 100 to which embodiments of the disclosed method for generating information or apparatus for generating information may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various client applications may be installed on the terminal devices 101, 102, 103. Such as browser-type applications, reading-type applications, content-sharing-type applications, search-type applications, social platform applications, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a backend server that provides support for client applications installed on the terminal devices 101, 102, 103. The server 105 may determine similarity of the text displayed on the terminal device and at least one piece of comment information of the text, and generate quality information of the text according to the similarity. Further, quality information of the generated text may also be stored in association with the text.
It should be noted that the text and at least one piece of comment information of the text may also be directly stored in a local database of the server 105 or a database corresponding to the server 105. At this time, the server 105 may directly extract and process the text stored in the local or corresponding database and at least one piece of comment information of the text, and at this time, the terminal apparatuses 101, 102, 103 and the network 104 may not exist.
It should be noted that the method for generating information provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for generating information is generally disposed in the server 105.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present disclosure is shown. The method for generating information comprises the following steps:
step 201, obtaining a text to be processed.
In this embodiment, an executing body (e.g., the server 105 shown in fig. 1) of the method for generating information may first acquire a text to be processed from a local or other storage device (e.g., the terminal devices 101, 102, 103, etc. shown in fig. 1). Of course, the execution subject may also obtain the text to be processed from the corresponding database or the third-party data platform.
The text to be processed may be various texts. The text to be processed may be a text designated by a technician or a text to be processed by a user sending an instruction. The text to be processed may also be a text determined according to a preset condition. In different application scenarios, the text to be processed may be different.
Step 202, at least one piece of comment information of the text to be processed is obtained.
In this embodiment, after the text to be processed is determined, comment information of the text to be processed may be further acquired. The comment information may refer to information for analyzing and evaluating the text to be processed. The review information may be used to set forth a perspective or attitude.
Similarly, the execution subject may obtain at least one piece of comment information of the text to be processed from a local storage device, another storage device, a corresponding database, or a third-party data platform. Generally, the text to be processed and the comment information of the text to be processed can be stored in association.
Because the text to be processed may have many comment information, in this case, part or all of the comment information of the text to be processed may be acquired according to the actual application requirements.
Step 203, determining the similarity between the text to be processed and at least one piece of comment information as the target similarity.
In this embodiment, the similarity between the text to be processed and at least one piece of comment information may be determined by using various existing text similarity determination methods. For example, keyword matching based algorithms, vector space based algorithms, deep learning based algorithms, and the like. The algorithm based on keyword matching includes, for example, N-Gram (chinese language model). The Vector space-based algorithm includes, for example, TF-IDF (Term Frequency-Inverse Document Frequency), Word2vec (Word To Vector), etc., and the Deep learning-based algorithm includes, for example, DDSM (Deep Structured Semantic Model), etc.
Alternatively, at least one piece of comment information may be merged into one text, and then the similarity between the text to be processed and the text corresponding to the comment information is determined as the similarity between the text to be processed and at least one piece of comment information by using the above-described various text similarity determination methods.
Optionally, the similarity between the comment information in the at least one piece of comment information and the text to be processed may be respectively determined, so as to obtain a similarity set; respectively determining the weight value of the comment information in at least one piece of comment information; and determining the weighted average of the similarity in the similarity set as the target similarity.
The weight value of each piece of comment information may be specified in advance by a technician, or may be determined according to the related information of each piece of comment information. For example, different weight values may be set according to the comment time corresponding to each piece of comment information.
Optionally, for comment information in at least one piece of comment information, statistical information of user operation corresponding to the comment information may be acquired, and then, according to the statistical information, a weight of the comment information may be determined.
The user operation may refer to various interactive operations between the user and the comment information. It should be understood that different terminal devices used by the user, different platforms displaying comment information, and the like may have different forms of user operations.
For example, the user operation may be a comment operation on the comment information, an operation of sharing the comment information to another page, an operation of clicking a control for indicating an attitude (such as support or anti-peer) of the comment information, and the like.
The statistical information of the user operation may refer to some statistical data obtained after the user operation is processed by using a statistical method. For example, the statistical information of the user operations may be the total number of various user operations received by the comment information, the total number of comment operations received by the comment information, and the like.
In some application scenarios, a higher weight value may be set for received comment information specifying a greater total number of user operations.
The weighted average value can make the quality evaluation of the text to be processed a relative value, and is convenient for quality sorting and other processing with other texts to be processed.
In practice, various text similarity determination methods can be flexibly selected and used according to different application scenes or service requirements.
And step 204, generating quality information of the text to be processed according to the target similarity.
In this embodiment, the quality information may be used to characterize the quality of the text to be processed. Under different application scenarios, the representation mode of the quality information can be flexible and changeable. For example, the quality information may be a specific numerical value for representing the quality score. For another example, the quality information may be preset grade marks indicating different degrees of goodness.
Generally, a good quality text will generally result in more analysis or discussion about the text. Therefore, it can be considered that the comment information of a text of good quality generally has a high similarity to the text. If the comment information is information which is not related to the text, it can be considered that the reverberation caused by the text is not very large, so that the quality of the text may be poor.
Based on this, the quality of the text to be processed can be evaluated based on the similarity of the text to be processed and the comment information thereof. Generally, it can be considered that the higher the similarity of the text to be processed and its comment information, the higher the quality of the text.
According to the direct proportion relation between the target similarity and the quality of the text to be processed, different methods for generating the quality information can be set according to the representation method of the quality information. For example, if a specific numerical value is used to represent the quality of the text to be processed, the target similarity may be directly used as the quality score of the text to be processed. At this time, the higher the quality score, the higher the quality of the text to be processed can be represented.
For another example, if the level identifier is used to indicate the quality of the text to be processed. The correspondence between the similarity and the quality corresponding to the different level identifications may be set in advance. By way of example, three level identifications of "a", "B", and "C" are used to represent the quality of the text to be processed. The level identifiers "a", "B", and "C" correspond to different similarity intervals, respectively. At this time, a corresponding grade identifier may be generated as quality information of the text to be processed according to the similarity interval where the determined target similarity exists.
With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 3, the executing entity may obtain the text to be processed 301 from the corresponding database, and then obtain three pieces of comment information (as shown by reference numerals 302, 303, and 304 in the figure) corresponding to the text to be processed 301.
Then, the text to be processed 301 and the three pieces of comment information may be represented as corresponding feature vectors, respectively, based on a VSM (Vector Space Model). Then, the similarity of the feature vector of the text to be processed 301 and the feature vectors of the three pieces of comment information may be calculated, respectively.
Then, as indicated by reference numeral 305 in the figure, an average value of the similarity degrees corresponding to the obtained three pieces of comment information, respectively, may be determined as the quality score of the text to be processed 301.
The method provided by the above embodiment of the present disclosure evaluates the quality of the text according to the similarity between the text and the comments thereof, so that a specific representation of the quality of the text can be realized, the number of features that can be used for characterizing the text is increased, and further, the feature of the quality of the text can also be applied to text-related analysis and processing.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:
step 401, obtaining a text to be processed.
Step 402, at least one piece of comment information of the text to be processed is obtained.
The specific implementation process of steps 401 and 402 can refer to the related description of steps 201 and 202 in the corresponding embodiment of fig. 2, and is not repeated herein.
Step 403, splitting the text to be processed into at least two sub-texts.
In this embodiment, different splitting modes may be selected according to different application scenarios and service requirements. For example, the text to be processed may be split according to paragraphs, and each paragraph of the text to be processed is taken as one sub-text. For another example, the text to be processed may be split according to sentences, and each sentence of the text to be processed is taken as one sub-text.
Step 404, for the sub-texts in at least two sub-texts, the following steps 4041 and 4043 are executed:
step 4041, determining similarity between the sub-document and the comment information in the at least one piece of comment information, and obtaining a similarity set corresponding to the sub-document.
Step 4042, determining the weight values of the comment information in the at least one piece of comment information respectively.
Step 4043, determine the weighted average of the similarities in the similarity set corresponding to the sub-document as the target similarity corresponding to the sub-document.
The specific implementation process of steps 4041, 4042, and 4043 may refer to the related description of step 203 in the corresponding embodiment of fig. 2, and will not be described herein again.
Step 405, determining the target similarity corresponding to the text to be processed according to the target similarity corresponding to each of the sub-texts in the at least two sub-texts.
In this embodiment, the target similarity corresponding to the text to be processed may be determined by comprehensively considering the target similarity corresponding to each sub-text.
Optionally, an average value of the target similarities corresponding to the sub-texts in the at least two sub-texts may be determined as the target similarity corresponding to the text to be processed.
Optionally, the maximum value of the target similarities corresponding to the sub-texts in the at least two sub-texts may be determined as the target similarity corresponding to the text to be processed.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating information in this embodiment highlights a step of splitting the text to be processed into at least two sub-texts, and then determining the quality information of the text to be processed by comprehensively considering the similarity between each sub-text and each piece of comment information. Therefore, when the content of some texts to be processed is long or the consistency of the content of the texts to be processed is poor, the texts to be processed can be processed in a split mode, and the accuracy of the quality information of the determined texts to be processed is improved.
With continued reference to fig. 5, a flow 500 of one embodiment of a method for pushing information in accordance with the present disclosure is shown. The method for pushing the information comprises the following steps:
step 501, obtaining a candidate pushed text set.
In this embodiment, an executing entity (e.g., the server 105 shown in fig. 1) of the method for pushing information may first obtain a candidate pushed text set from a corresponding database or other data platform. The candidate pushed texts may be various texts that can be pushed.
Step 502, for a candidate pushed text in the candidate pushed text set, generating quality information of the candidate pushed text.
In the present embodiment, the quality information of each candidate pushed text may be generated by using the method for generating information described in the embodiments corresponding to fig. 2 and fig. 4.
Step 503, selecting a candidate pushed text with corresponding quality information meeting a preset condition from the candidate pushed text set, and pushing the selected candidate pushed text.
In this step, the preset condition may be preset by a technician according to application requirements. For example, when the quality information is represented by a specific numerical value, the preset condition may be that the quality information is greater than a preset threshold.
The method provided by the embodiment of the disclosure limits the quality of the text according to the preset condition, thereby screening the candidate pushed texts in the candidate pushed text set, filtering out the candidate pushed texts which do not meet the preset condition, effectively reducing the number of the pushed texts, and reducing the flow consumption of the terminal device receiving the pushed texts. Meanwhile, the exposure rate of the text with higher quality can be increased in the mode, and the exposure rate of the text with lower quality is also reduced.
With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.
As shown in fig. 6, the apparatus 600 for generating information provided by the present embodiment includes an acquisition unit 601, a determination unit 602, and a generation unit 603. Wherein the obtaining unit 601 is configured to obtain a text to be processed; the obtaining unit 601 is further configured to obtain at least one piece of comment information of the text to be processed; the determining unit 602 is configured to determine a similarity between the text to be processed and at least one piece of comment information as a target similarity; the generating unit 603 is configured to generate quality information of the text to be processed according to the target similarity, wherein the quality information is used for representing the quality of the text to be processed.
In the present embodiment, in the apparatus 600 for generating information: the specific processing of the obtaining unit 601, the determining unit 602, and the generating unit 603 and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementations of this embodiment, the determining unit is further configured to: respectively determining the similarity between the comment information in at least one piece of comment information and the text to be processed to obtain a similarity set; respectively determining the weight value of the comment information in at least one piece of comment information; and determining the weighted average of the similarity in the similarity set as the target similarity.
In some optional implementations of this embodiment, the determining unit is further configured to: splitting a text to be processed into at least two sub-texts; for the subfiles in at least two subfiles, determining the similarity between the subfiles and the comment information in at least one piece of comment information to obtain a similarity set corresponding to the subfiles; respectively determining the weight value of the comment information in at least one piece of comment information; determining a weighted average of the similarity in the similarity set corresponding to the sub-document as a target similarity corresponding to the sub-document; and determining the target similarity corresponding to the text to be processed according to the target similarity corresponding to the sub-texts in the at least two sub-texts.
In some optional implementations of this embodiment, the determining unit is further configured to: and determining the average value or the maximum value of the target similarity corresponding to the sub-texts in the at least two sub-texts as the target similarity corresponding to the text to be processed.
In some optional implementations of this embodiment, the determining unit is further configured to: for comment information in at least one piece of comment information, obtaining statistical information of user operation corresponding to the comment information; and determining the weight of the comment information according to the statistical information.
According to the device provided by the embodiment of the disclosure, the to-be-processed text is acquired through the acquisition unit, and at least one piece of comment information of the to-be-processed text is acquired; the determining unit determines the similarity between the text to be processed and at least one piece of comment information as the target similarity; the generating unit generates the quality information of the text to be processed according to the target similarity, wherein the quality information is used for representing the quality of the text to be processed, so that the evaluation on the quality of the text is completed by utilizing the similarity between the text and the comments of the text, the number of the features which can be used for representing the text is increased, and the feature of the quality of the text can be further applied to the analysis and processing related to the text.
Referring now to FIG. 7, a block diagram of an electronic device (e.g., the server of FIG. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the server; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a text to be processed; acquiring at least one piece of comment information of a text to be processed; determining the similarity between the text to be processed and at least one piece of comment information as target similarity; and generating quality information of the text to be processed according to the target similarity, wherein the quality information is used for representing the quality of the text to be processed.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprising: the device comprises an acquisition unit, a determination unit and a generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquiring unit may also be described as a "unit that acquires text to be processed".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.