WO2019085856A1 - Text resource push method and apparatus, storage medium, and processor - Google Patents

Text resource push method and apparatus, storage medium, and processor Download PDF

Info

Publication number
WO2019085856A1
WO2019085856A1 PCT/CN2018/112379 CN2018112379W WO2019085856A1 WO 2019085856 A1 WO2019085856 A1 WO 2019085856A1 CN 2018112379 W CN2018112379 W CN 2018112379W WO 2019085856 A1 WO2019085856 A1 WO 2019085856A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
similarity
target
determining
keyword
Prior art date
Application number
PCT/CN2018/112379
Other languages
French (fr)
Chinese (zh)
Inventor
石鹏
王福伟
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2019085856A1 publication Critical patent/WO2019085856A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications

Definitions

  • the present application relates to the field of communications, and in particular, to a method, an apparatus, a storage medium, and a processor for pushing a text resource.
  • the embodiment of the present application provides a method, an apparatus, a storage medium, and a processor for pushing a text resource, so as to at least solve the problem that the push efficiency of the text resource in the related art is low.
  • a method for pushing a text resource including: determining a target text type of a first text acquired by a client; and searching for a type of the target text type in a preset text set. a second text; determining a first similarity between the first text and the plurality of second texts; determining that the second text whose first similarity satisfies a preset condition is a target text; The text is pushed to the client.
  • determining the first similarity between the first text and the plurality of second texts comprises: dividing the first text according to a plurality of first keywords to obtain a first text block a set, wherein the first keyword is used to indicate a feature of a text segment, the first keyword is in one-to-one correspondence with a first text block in the first set of text blocks, and the first keyword is Corresponding to a second target block in each of the second texts; determining a first target similarity between the first text block and the second text block corresponding to each of the first keywords; The first preset weight corresponding to the first target similarity and the first target similarity determine the first similarity.
  • determining, according to the first preset weight corresponding to each of the first target similarities, the first similarity, the first similarity includes one of: determining according to the first preset weight a first weighted sum of the first target similarity, using the first weighted sum as the first similarity; determining a first weighted average of the first target similarity according to the first preset weight Determining, by the first weighted average, the second similarity; determining a second target similarity between the first text and the second text; according to the first preset weight and the first a second predetermined weight corresponding to the two target similarities determines a second weighted sum of the first target similarity and the second target similarity, and the second weighted sum is used as the first similarity; Determining a second target similarity between the first text and the second text; determining the first target similarity according to the first preset weight and the second preset weight corresponding to the second target similarity a second weighted flat with the second target similarity Number, the second weighted average
  • determining that the second text that the first similarity meets the preset condition is the target text includes one of: determining that the second text that the first similarity falls within a preset threshold range is Target text; sorting the second text according to the first similarity from high to low; determining that the second predetermined number of the second text is the target text.
  • determining the target text type of the first text acquired by the client comprises: searching for a paragraph in which the second keyword is located in the first text, and determining the found paragraph as a feature paragraph; obtaining a third keyword in the feature paragraph; searching for the target text type corresponding to the third keyword from a correspondence between a keyword and a text type.
  • a text resource pushing apparatus comprising: a first determining module configured to determine a target text type of a first text acquired by a client; a lookup module configured to be in advance Locating a plurality of second texts of the type of the target text type in the set of texts; the second determining module is configured to determine a first similarity between the first text and the plurality of second texts; The third determining module is configured to determine that the second text whose first similarity satisfies the preset condition is the target text; and the pushing module is configured to push the target text to the client.
  • the second determining module includes: a dividing unit, configured to divide the first text according to the multiple first keywords to obtain a first text block set, where the first keyword is used And indicating a feature of the text paragraph, the first keyword is in one-to-one correspondence with the first text block in the first text block set, and the first keyword and the second of each of the second texts a first one of the text blocks; the first determining unit is configured to determine a first target similarity between the first text block and the second text block corresponding to each of the first keywords; and the second determining unit is configured The first similarity is determined according to the first preset weight corresponding to each of the first target similarities and the first target similarity.
  • the first determining module includes: a first searching unit, configured to search for a paragraph in which the second keyword is located in the first text, and determine the found paragraph as a feature paragraph; a unit, configured to obtain a third keyword in the feature paragraph; and a second search unit configured to search for the target text type corresponding to the third keyword from a correspondence between a keyword and a text type.
  • a storage medium comprising a stored program, wherein the program is executed to perform the method of any of the above.
  • a processor for running a program wherein the program is executed to perform the method of any of the above.
  • determining a target text type of the first text acquired by the client searching for a plurality of second texts of the target text type in the preset text collection; determining between the first text and the plurality of second texts a first similarity; determining that the second text whose first similarity satisfies the preset condition is the target text; and pushing the target text to the client, thereby showing that the target text type of the first text obtained is preset according to the obtained scheme Searching for a plurality of second texts of the same type as the first text, thereby ensuring that the pushed text resource is the same type of text resource as the text that the user desires to find, and then obtaining and searching from the plurality of second texts found.
  • the second text with the higher similarity of the first text is used as the target text, and the target text is pushed to the client, so that the text resource of the pushed client has the same text type and the same as the first text obtained from the client, and is improved.
  • the relevance and effectiveness of pushing text resources to the client thus improving the efficiency of pushing text resources, thereby solving the phase
  • FIG. 1 is a block diagram showing the hardware structure of a mobile terminal for pushing a text resource according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for pushing a text resource according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of determining a first similarity according to an alternative embodiment of the present application.
  • FIG. 4 is a structural block diagram 1 of a text resource pushing apparatus according to an embodiment of the present application.
  • FIG. 5 is a structural block diagram 2 of a text resource pushing apparatus according to an embodiment of the present application.
  • FIG. 6 is a structural block diagram 3 of a text resource pushing apparatus according to an embodiment of the present application.
  • Embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or the like.
  • the mobile terminal 10 may include one or more (in the figure). Only one processor 102 is shown (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. . It will be understood by those skilled in the art that the structure shown in FIG. 1 is merely illustrative and does not limit the structure of the above electronic device.
  • the mobile terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
  • the memory 104 can be configured as a software program and a module for storing application software, such as program instructions/modules corresponding to the push method of the text resource in the embodiment of the present application, and the processor 102 runs the software program and the module stored in the memory 104. Thereby performing various functional applications and data processing, that is, implementing the above method.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be connected to mobile terminal 10 over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is arranged to receive or transmit data via a network.
  • the network instance described above may include a wireless network provided by a communication provider of the mobile terminal 10.
  • the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 can be a Radio Frequency (RF) module configured to communicate with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a method for pushing a text resource according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
  • Step S202 determining a target text type of the first text acquired by the client
  • Step S204 searching for a plurality of second texts of the target text type in the preset text set
  • Step S206 determining a first similarity between the first text and the plurality of second texts
  • Step S208 determining that the second text whose first similarity meets the preset condition is the target text
  • step S210 the target text is pushed to the client.
  • the pushing method of the text resource may be, but is not limited to, being applied to a scenario in which a text resource is pushed for a user.
  • a scenario in which a text resource is pushed for a user For example, a scene in which a text resource is pushed for a user in a news information application, a scene in which a text resource is pushed for a user in a text resource reading application, and the like.
  • the method for pushing the text resource may be, but is not limited to, being applied to a terminal device or a server device, etc.
  • the terminal device may include, but is not limited to, a mobile phone, a tablet computer, a PC computer, a smart wearable device, an intelligent electronic device, Smart home equipment and more.
  • the foregoing client may be, but is not limited to, a client that is an application, for example, the foregoing application may include, but is not limited to, a news information application, a text resource reading application, an instant messaging application, and a browser. Application and more.
  • the text type of the text resource may be, but is not limited to, divided according to the domain involved in the text resource, such as: sports, entertainment, technology, finance, military, and the like.
  • the text type of the text resource can also be divided according to the classification rules in a certain type of text resource. For example, the judgment documents in the legal document are classified into the first-instance document, the second-instance document, etc. according to the trial level, and are classified according to the type of the case. It is a civil case file or an administrative case file; or the court transcript/judgment document in the legal document is divided into a trademark infringement dispute file, a life right dispute file, a divorce dispute file, and so on.
  • the preset condition may be set as a condition for acquiring the second text with the first higher similarity.
  • the first similarity is the highest, the first similarity is higher than a certain preset value, and the like.
  • the predicate document is an important written material and basis for a court referee case
  • the court file is used as a case for the judge to intelligently recommend a similar case for the judge according to the current pre-document material.
  • the important components include: trial transcripts, complaints, pleadings, etc.
  • the referee document records the process and results of the court's trial of the case and is the carrier of the outcome of the litigation activity.
  • the pre-document includes the following information: the generalization of the nature of the legal relationship involved in the litigation case (the case); the appeal of the plaintiff or the appellant; the defense of the court or the appele; the display, debate and cross-examination of the evidence of the parties Comments and so on. This information is an important reference for judges to make litigation decisions.
  • the referee documents also include the above information, in addition to the court's arguments on the case, the applicable law of the judge's decision and the outcome of the referee.
  • the server receives the first text obtained by the client, and the first text is a pre-document of the current pending case, and the text type of the pre-document can be classified into a trademark infringement dispute, a right to life dispute, a divorce dispute, etc. according to the case type.
  • the server determines that the target text type of the current pre-document is a civil case of trademark infringement disputes, and the preset text collection includes a large number of judgment documents of the judged cases, in the preset text
  • the search case in the collection is determined by the judgment document of the civil case for the trademark infringement dispute as the plurality of second texts, and the first similarity between the current pre-document and the judgment document of the civil case in which the case is a trademark infringement dispute is determined.
  • a similarity is sorted, and the top 10 judges are determined as the target text, and the target text is pushed to the client.
  • the first text may be divided into a plurality of first text blocks, and respectively determined between the first text block and the second text block in the second text.
  • a target similarity is further weighted according to the degree of influence of each text block on the similarity between the two texts, thereby determining a first similarity between the first text and each second text.
  • the first text is divided according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of the text paragraph, the first keyword and the first
  • the first text blocks in the set of text blocks are in one-to-one correspondence, and the first keyword is in one-to-one correspondence with the second text blocks in each second text, and the first text block corresponding to each first keyword is determined
  • the second text block in the second text may be, but is not limited to, being pre-processed and parsed by the server, and the second text block may be multiple, and the server may follow the paragraph content.
  • the feature parses the second text into a plurality of second text blocks.
  • the server can parse the judgment documents in the judgment document library, and analyze the following paragraphs: the appeal section, the defense section, the evidence section, the cross-examination opinion section, the dispute focus section, the trial identification section, the court thinks that the paragraph, etc. .
  • the server performs paragraph analysis on the trial transcript in the first text, and obtains the following description segment: the original telling paragraph (the facts and reasons of the prosecution filed by the plaintiff and the lawsuit request); the court reply segment (the court For the original statement, please provide a statement of opinion; the evidence paragraph (the two parties show the evidence); the cross-examination opinion and the debate paragraph (the mutual cross-examination and debate between the parties); the court inquiry section (the court's inquiry and the parties' answers)
  • the server analyzes the paragraph of the complaint or the appeal, and obtains the following description: the original telling paragraph and the fact reason description.
  • a paragraph analysis of the defense or appeal reply is obtained as follows: Defendant's defense.
  • the above first text block includes: the complaint (appeal section) + appeal (the appeal section), the reply (the defense section) + the appeal reply (the defense section), the trial transcript (the court investigation and debate section) + Complaint (de facto reason section).
  • the second text block of the second text includes: a referee document (a petition), a referee document (a reply segment) ), the judgment document (the factual recognition section + the court thinks the paragraph), the full text of the judgment document.
  • the server determines the similarity of the first target of the complaint (appeal section) + appeal (the appeal section) and the judgment document (the appeal section) is S1, the reply (the defense paragraph) + the appeal reply (the defense paragraph) )
  • the similarity with the first target of the judgment document (the defense paragraph) is S2, the trial transcript (the court investigation and debate section) + the complaint (the factual reason section) and the judgment document (the fact finding section + the court thinks the paragraph)
  • a target similarity is S3.
  • the first preset weights corresponding to the first target similarity are W1, W2, and W3, respectively.
  • the server determines the first similarity according to the first preset weight corresponding to the first target similarity and the first target similarity.
  • the first similarity may be determined by one of the following methods:
  • the first weighted sum of the first target similarity is determined according to the first preset weight, and the first weighted sum is used as the first similarity.
  • the first similarity P W1*S1+W2*S2+W3*S3.
  • the first weighted average of the first target similarity is determined according to the first preset weight, and the first weighted average is used as the first similarity.
  • the first similarity P (W1*S1+W2*S2+W3*S3)/3.
  • the third method determines a second target similarity between the first text and the second text, and determines that the first target similarity and the second target are similar according to the first preset weight and the second preset weight corresponding to the second target similarity
  • the second weighted sum of degrees, the second weighted sum is taken as the first similarity.
  • the second target similarity between the first text and the second text is X
  • the second predetermined weight corresponding to the second target similarity is V
  • the second weighted average of degrees, the second weighted average is taken as the first similarity.
  • the second target similarity between the first text and the second text is X
  • the second predetermined weight corresponding to the second target similarity is V
  • determining a second target similarity between the first text and the second text, and considering the influence of the second target similarity on the first similarity between the two texts In the process of determining the similarity, it is possible to avoid screening out texts with a high degree of similarity but not similar in general.
  • the target text may be determined from the plurality of second texts by one of the following methods:
  • the second text whose first similarity falls within the preset threshold range is determined as the target text. For example, if the preset threshold range is set to be higher than P0, the second text whose first similarity is greater than P0 is used as the target text.
  • the second text is sorted according to the first similarity from high to low; and the second predetermined text in the preset number is determined as the target text.
  • the preset number can be 1
  • the first text with the highest similarity among the plurality of second texts is determined as the target text.
  • the preset number may be 10, and the first tenth similarity text is selected as the target text from the plurality of second texts.
  • the target text type may be determined by: searching for the paragraph in which the second keyword is located in the first text, and determining the found paragraph as the feature paragraph, in the feature paragraph
  • the third keyword is obtained, and the target text type corresponding to the third keyword is searched from the correspondence between the keyword and the text type.
  • the second keyword may be used to indicate a feature of the target text segment, the second keyword corresponds to a first text block in the first text block set, and the third keyword may be appealed.
  • a keyword used to characterize a text type For example, the first text mentioned above is a trial transcript, and the server searches for a paragraph in the trial transcript that the second keyword "announces the hearing" and identifies the paragraph as a feature paragraph, in which the third keyword "trademark rights" is obtained. In the dispute, the server can search for the target text type corresponding to the “trademark dispute” from the correspondence between the keyword and the text type as the trademark ownership dispute case.
  • the case, the level of the case and the type of the case in the pre-document can be analyzed, and the description of the case in the transcript of the trial is first extracted (in the paragraph section of the announcement). For example: "Trial: Announced the opening of the court. The Beijing xxx People's Court, today applies the ordinary procedure in accordance with the law to publicly hear the plaintiff xxx v.
  • the court xx trademark ownership dispute case the case is judged by the court's acting judge xxx as the presiding judge, and the court The acting judges xxx, xxx form a collegiate panel according to law, and the clerk xxx serves as the court record.”
  • the server extracts the case (trademark ownership dispute) and resolves the type of case (civil or administrative).
  • a text resource pushing device is further provided, which is used to implement the above-mentioned embodiments and optional embodiments, and has not been described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 4 is a structural block diagram 1 of a text resource pushing apparatus according to an embodiment of the present application. As shown in FIG. 4, the apparatus includes:
  • the first determining module 402 is configured to determine a target text type of the first text acquired by the client;
  • the searching module 404 is coupled to the first determining module 402, and configured to search, in the preset text set, a plurality of second texts of a type of the target text type;
  • a second determining module 406 coupled to the lookup module 404, configured to determine a first similarity between the first text and the plurality of second texts;
  • the third determining module 408 is coupled to the second determining module 406, and is configured to determine that the second text whose first similarity meets the preset condition is the target text;
  • the push module 410 coupled to the third determination module 408, is configured to push the target text to the client.
  • the pushing device of the text resource may be, but is not limited to, applied to a scenario in which a text resource is pushed for a user.
  • a scenario in which a text resource is pushed for a user For example, a scene in which a text resource is pushed for a user in a news information application, a scene in which a text resource is pushed for a user in a text resource reading application, and the like.
  • the pushing device of the text resource may be, but is not limited to, being applied to a terminal device or a server device, etc.
  • the terminal device may include, but is not limited to, a mobile phone, a tablet computer, a PC computer, a smart wearable device, an intelligent electronic device, Smart home equipment and more.
  • the client may be, but is not limited to, a client that is an application, for example, the application may be, but not limited to, an envelope: a news information application, a text resource reading application, an instant messaging application, and browsing. Applications and more.
  • the text type of the text resource may be, but is not limited to, divided according to the domain involved in the text resource, such as: sports, entertainment, technology, finance, military, and the like.
  • the text type of the text resource can also be divided according to the classification rules in a certain type of text resource. For example, the judgment documents in the legal document are classified into the first-instance document, the second-instance document, etc. according to the trial level, and are classified according to the type of the case. It is a civil case file or an administrative case file; or the court transcript/judgment document in the legal document is divided into a trademark infringement dispute file, a life right dispute file, a divorce dispute file, and so on.
  • the preset condition may be set as a condition for acquiring the second text with the first higher similarity.
  • the first similarity is the highest, the first similarity is higher than a certain preset value, and the like.
  • the predicate document is an important written material and basis for a court referee case
  • the court file is used as a case for the judge to intelligently recommend a similar case for the judge according to the current pre-document material.
  • the important components include: trial transcripts, complaints, pleadings, etc.
  • the referee document records the process and results of the court's trial of the case and is the carrier of the outcome of the litigation activity.
  • the pre-document includes the following information: the generalization of the nature of the legal relationship involved in the litigation case (the case); the appeal of the plaintiff or the appellant; the defense of the court or the appele; the display, debate and cross-examination of the evidence of the parties Comments and so on. This information is an important reference for judges to make litigation decisions.
  • the referee documents also include the above information, in addition to the court's arguments on the case, the applicable law of the judge's decision and the outcome of the referee.
  • the server receives the first text obtained by the client, and the first text is a pre-document of the current pending case, and the text type of the pre-document can be classified into a trademark infringement dispute, a right to life dispute, a divorce dispute, etc. according to the case type.
  • the server determines that the target text type of the current pre-document is a civil case of trademark infringement disputes, and the preset text collection includes a large number of judgment documents of the judged cases, in the preset text
  • the search case in the collection is determined by the judgment document of the civil case for the trademark infringement dispute as the plurality of second texts, and the first similarity between the current pre-document and the judgment document of the civil case in which the case is a trademark infringement dispute is determined.
  • a similarity is sorted, and the top 10 judges are determined as the target text, and the target text is pushed to the client.
  • the above device searching for a plurality of second texts of the same text type from the preset text set according to the acquired target text type of the first text, thereby ensuring that the pushed text resource is of the same type as the text that the user desires to find.
  • a text resource and then obtaining a second text having a higher degree of similarity to the first text as the target text from the plurality of second texts found, and pushing the target text to the client, so that the text resource of the pushed client is
  • the first text obtained by the client has the same text type and is similar, which improves the pertinence and validity of pushing the text resource to the client, thereby improving the pushing efficiency of the text resource, thereby solving the text resource in the related art. Pushing low efficiency issues.
  • FIG. 5 is a block diagram of a structure of a text resource pushing apparatus according to an embodiment of the present application.
  • the second determining module 406 includes:
  • the dividing unit 52 is configured to divide the first text according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of the text paragraph, the first keyword and the first text
  • the first text blocks in the block set are in one-to-one correspondence, and the first keyword is in one-to-one correspondence with the second text block in each second text;
  • the first determining unit 54 coupled to the dividing unit 52, is configured to determine a first target similarity between the first text block and the second text block corresponding to each first keyword;
  • the second determining unit 56 is coupled to the first determining unit 54 and configured to determine the first similarity according to the first target weight corresponding to each first target similarity and the first target similarity.
  • the first text may be divided into a plurality of first text blocks, and respectively determined between the first text block and the second text block in the second text.
  • a target similarity is further weighted according to the degree of influence of each text block on the similarity between the two texts, thereby determining a first similarity between the first text and each second text.
  • the second text block in the second text may be, but is not limited to, being pre-processed and parsed by the server, and the second text block may be multiple, and the server may follow the paragraph content.
  • the feature parses the second text into a plurality of second text blocks.
  • the server can parse the judgment documents in the judgment document library, and analyze the following paragraphs: the appeal section, the defense section, the evidence section, the cross-examination opinion section, the dispute focus section, the trial identification section, the court thinks that the paragraph, etc. .
  • the server performs paragraph analysis on the trial transcript in the first text, and obtains the following description segment: the original telling paragraph (the facts and reasons of the prosecution filed by the plaintiff and the lawsuit request); the court reply segment (the court For the original statement, please provide a statement of opinion; the evidence paragraph (the two parties show the evidence); the cross-examination opinion and the debate paragraph (the mutual cross-examination and debate between the parties); the court inquiry section (the court's inquiry and the parties' answers)
  • the server analyzes the paragraph of the complaint or the appeal, and obtains the following description: the original telling paragraph and the fact reason description.
  • a paragraph analysis of the defense or appeal reply is obtained as follows: Defendant's defense.
  • the above first text block includes: the complaint (appeal section) + appeal (the appeal section), the reply (the defense section) + the appeal reply (the defense section), the trial transcript (the court investigation and debate section) + Complaint (de facto reason section).
  • the second text block of the second text includes: a referee document (a petition), a referee document (a reply segment) ), the judgment document (the factual recognition section + the court thinks the paragraph), the full text of the judgment document.
  • the server determines the similarity of the first target of the complaint (appeal section) + appeal (the appeal section) and the judgment document (the appeal section) is S1, the reply (the defense paragraph) + the appeal reply (the defense paragraph) )
  • the similarity with the first target of the judgment document (the defense paragraph) is S2, the trial transcript (the court investigation and debate section) + the complaint (the factual reason section) and the judgment document (the fact finding section + the court thinks the paragraph)
  • a target similarity is S3.
  • the first preset weights corresponding to the first target similarity are W1, W2, and W3, respectively.
  • the server determines the first similarity according to the first preset weight corresponding to the first target similarity and the first target similarity.
  • the second determining unit 56 is set to one of the following:
  • the first similarity P W1*S1+ W2*S2+W3*S3.
  • the first similarity P (W1 *S1+W2*S2+W3*S3)/3.
  • Determining a second target similarity between the first text and the second text determining a first target similarity and a second target similarity according to the first preset weight and the second preset weight corresponding to the second target similarity
  • the two weighted averages take the second weighted average as the first similarity.
  • the second target similarity between the first text and the second text is X
  • the second predetermined weight corresponding to the second target similarity is V
  • determining a second target similarity between the first text and the second text, and considering the influence of the second target similarity on the first similarity between the two texts In the process of determining the similarity, it is possible to avoid screening out texts with a high degree of similarity but not similar in general.
  • the third determining module 408 is set to one of the following:
  • Determining that the second text whose first similarity falls within the preset threshold range is the target text; for example, if the preset threshold range is set to be higher than P0, the second text whose first similarity is greater than P0 is used as the target text.
  • FIG. 6 is a block diagram 3 of a structure of a text resource pushing apparatus according to an embodiment of the present application.
  • the first determining module 402 includes:
  • the first searching unit 62 is configured to search for a paragraph in which the second keyword is located in the first text, and determine the found paragraph as a feature paragraph;
  • An obtaining unit 64 coupled to the first searching unit 62, configured to acquire a third keyword in the feature paragraph;
  • the second searching unit 66 is coupled to the obtaining unit 64 and configured to search for the target text type corresponding to the third keyword from the correspondence between the keyword and the text type.
  • the second keyword may be used to indicate a feature of the target text segment, the second keyword corresponds to a first text block in the first text block set, and the third keyword may be appealed.
  • the first text mentioned above is a trial transcript
  • the server searches for a paragraph in the trial transcript that the second keyword "announces the hearing” and identifies the paragraph as a feature paragraph, in which the third keyword "trademark rights" is obtained.
  • the server can search for the target text type corresponding to the “trademark dispute” from the correspondence between the keyword and the text type as the trademark ownership dispute case.
  • the case, the level of the case and the type of the case in the pre-document can be analyzed, and the description of the case in the transcript of the trial is first extracted (in the paragraph section of the announcement). For example: "Trial: Announced the opening of the court. The Beijing xxx People's Court, today applies the ordinary procedure in accordance with the law to publicly hear the plaintiff xxx v.
  • the court xx trademark ownership dispute case the case is judged by the court's acting judge xxx as the presiding judge, and the court The acting judges xxx, xxx form a collegiate panel according to law, and the clerk xxx serves as the court record.”
  • the server extracts the case (trademark ownership dispute) and resolves the type of case (civil or administrative).
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • the embodiment of the present application further provides a storage medium including a stored program, wherein the program runs to perform the method described in any of the above.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the target text is pushed to the client.
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Embodiments of the present application also provide a processor for running a program, wherein the program is executed to perform the steps of any of the above methods.
  • the foregoing program is used to perform the following steps:
  • the target text is pushed to the client.
  • modules or steps of the present application can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the application is not limited to any particular combination of hardware and software.
  • the application searches for a plurality of second texts of the same text type from the preset text set according to the obtained target text type of the first text, so as to ensure that the pushed text resource is desired by the user.
  • a text resource of the same type as the text and then obtaining a second text having a higher similarity with the first text as the target text from the plurality of second texts found, and pushing the target text to the client, so that the text of the pushed client is
  • the resource has the same text type and similarity as the first text obtained from the client, which improves the pertinence and effectiveness of pushing the text resource to the client, thereby improving the pushing efficiency of the text resource, thereby solving the related technology Chinese The problem of low efficiency of pushing this resource.

Abstract

The present application provides a text resource push method and apparatus, a storage medium, and a processor. The method comprises: determining the target text type of a first text obtained by a client; searching for multiple second texts whose types are the target text type in a preset text set; determining first similarities between the first text and the multiple second texts; determining second texts whose first similarities satisfy preset conditions, as target texts; and pushing the target texts to the client. By using the technical solution, the problem in the related art of the low push efficiency of text resources is resolved, thereby improving the push efficiency of the text resources.

Description

文本资源的推送方法、装置、存储介质及处理器Text resource pushing method, device, storage medium and processor
本申请基于申请号为201711053298.2、申请日为2017年10月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is based on a Chinese patent application filed on Jan. 31, 2017, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本申请涉及通信领域,具体而言,涉及一种文本资源的推送方法、装置、存储介质及处理器。The present application relates to the field of communications, and in particular, to a method, an apparatus, a storage medium, and a processor for pushing a text resource.
背景技术Background technique
随着互联网技术的迅猛发展,网络上充斥着大量的文本资源。有的时候,用户可能需要查找相关资料进行浏览,如何为用户推送相关文本资源就成为了目前研究的重点。但是,现有的文本资源的推送方式的针对性和有效性都比较低,这就导致了文本资源的推送效率较低。With the rapid development of Internet technology, the network is full of text resources. Sometimes, users may need to find relevant information to browse, and how to push relevant text resources for users has become the focus of current research. However, the existing text resource push method is relatively low in relevance and effectiveness, which leads to low efficiency of text resource push.
针对相关技术中文本资源的推送效率较低的问题,目前还没有有效地解决方案。There is currently no effective solution to the problem of low push efficiency of text resources in related technologies.
发明内容Summary of the invention
本申请实施例提供了一种文本资源的推送方法、装置、存储介质及处理器,以至少解决相关技术中文本资源的推送效率较低的问题。The embodiment of the present application provides a method, an apparatus, a storage medium, and a processor for pushing a text resource, so as to at least solve the problem that the push efficiency of the text resource in the related art is low.
根据本申请的一个实施例,提供了一种文本资源的推送方法,包括:确定客户端获取的第一文本的目标文本类型;在预设的文本集合中查找类型为所述目标文本类型的多个第二文本;确定所述第一文本与所述多个第二文本之间的第一相似度;确定所述第一相似度满足预设条件的第二文本为目标文本;将所述目标文本推送给所述客户端。According to an embodiment of the present application, a method for pushing a text resource is provided, including: determining a target text type of a first text acquired by a client; and searching for a type of the target text type in a preset text set. a second text; determining a first similarity between the first text and the plurality of second texts; determining that the second text whose first similarity satisfies a preset condition is a target text; The text is pushed to the client.
可选地,确定所述第一文本与所述多个第二文本之间的所述第一相似 度包括:根据多个第一关键词对所述第一文本进行划分,得到第一文本块集合,其中,所述第一关键词用于指示文本段落的特征,所述第一关键词与所述第一文本块集合中的第一文本块一一对应,并且所述第一关键词与每个所述第二文本中的第二文本块一一对应;确定每个所述第一关键词对应的第一文本块与第二文本块之间的第一目标相似度;根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度。Optionally, determining the first similarity between the first text and the plurality of second texts comprises: dividing the first text according to a plurality of first keywords to obtain a first text block a set, wherein the first keyword is used to indicate a feature of a text segment, the first keyword is in one-to-one correspondence with a first text block in the first set of text blocks, and the first keyword is Corresponding to a second target block in each of the second texts; determining a first target similarity between the first text block and the second text block corresponding to each of the first keywords; The first preset weight corresponding to the first target similarity and the first target similarity determine the first similarity.
可选地,根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度包括以下之一:根据所述第一预设权重确定所述第一目标相似度的第一加权和,将所述第一加权和作为所述第一相似度;根据所述第一预设权重确定所述第一目标相似度的第一加权平均数,将所述第一加权平均数作为所述第一相似度;确定所述第一文本与所述第二文本之间的第二目标相似度;根据所述第一预设权重以及所述第二目标相似度对应的第二预设权重确定所述第一目标相似度和所述第二目标相似度的第二加权和,将所述第二加权和作为所述第一相似度;确定所述第一文本与所述第二文本之间的第二目标相似度;根据所述第一预设权重以及所述第二目标相似度对应的第二预设权重确定所述第一目标相似度和所述第二目标相似度的第二加权平均数,将所述第二加权平均数作为所述第一相似度。Optionally, determining, according to the first preset weight corresponding to each of the first target similarities, the first similarity, the first similarity includes one of: determining according to the first preset weight a first weighted sum of the first target similarity, using the first weighted sum as the first similarity; determining a first weighted average of the first target similarity according to the first preset weight Determining, by the first weighted average, the second similarity; determining a second target similarity between the first text and the second text; according to the first preset weight and the first a second predetermined weight corresponding to the two target similarities determines a second weighted sum of the first target similarity and the second target similarity, and the second weighted sum is used as the first similarity; Determining a second target similarity between the first text and the second text; determining the first target similarity according to the first preset weight and the second preset weight corresponding to the second target similarity a second weighted flat with the second target similarity Number, the second weighted average as the first similarity.
可选地,确定所述第一相似度满足预设条件的第二文本为目标文本包括以下之一:确定所述第一相似度落于预设阈值范围内的所述第二文本为所述目标文本;按照所述第一相似度由高到低对所述第二文本进行排序;确定排在前预设数量个的所述第二文本为所述目标文本。Optionally, determining that the second text that the first similarity meets the preset condition is the target text includes one of: determining that the second text that the first similarity falls within a preset threshold range is Target text; sorting the second text according to the first similarity from high to low; determining that the second predetermined number of the second text is the target text.
可选地,确定所述客户端获取的所述第一文本的所述目标文本类型包括:在所述第一文本中查找第二关键词所在的段落,并将查找到的所述段落确定为特征段落;在所述特征段落中获取第三关键词;从关键词与文本类型的对应关系中查找所述第三关键词对应的所述目标文本类型。Optionally, determining the target text type of the first text acquired by the client comprises: searching for a paragraph in which the second keyword is located in the first text, and determining the found paragraph as a feature paragraph; obtaining a third keyword in the feature paragraph; searching for the target text type corresponding to the third keyword from a correspondence between a keyword and a text type.
根据本申请的另一个实施例,提供了一种文本资源的推送装置,包括:第一确定模块,被设置为确定客户端获取的第一文本的目标文本类型;查找模块,被设置为在预设的文本集合中查找类型为所述目标文本类型的多个第二文本;第二确定模块,被设置为确定所述第一文本与所述多个第二文本之间的第一相似度;第三确定模块,被设置为确定所述第一相似度满足预设条件的第二文本为目标文本;推送模块,被设置为将所述目标文本推送给所述客户端。According to another embodiment of the present application, there is provided a text resource pushing apparatus, comprising: a first determining module configured to determine a target text type of a first text acquired by a client; a lookup module configured to be in advance Locating a plurality of second texts of the type of the target text type in the set of texts; the second determining module is configured to determine a first similarity between the first text and the plurality of second texts; The third determining module is configured to determine that the second text whose first similarity satisfies the preset condition is the target text; and the pushing module is configured to push the target text to the client.
可选地,所述第二确定模块包括:划分单元,被设置为根据多个第一关键词对所述第一文本进行划分,得到第一文本块集合,其中,所述第一关键词用于指示文本段落的特征,所述第一关键词与所述第一文本块集合中的第一文本块一一对应,并且所述第一关键词与每个所述第二文本中的第二文本块一一对应;第一确定单元,被设置为确定每个所述第一关键词对应的第一文本块与第二文本块之间的第一目标相似度;第二确定单元,被设置为根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度。Optionally, the second determining module includes: a dividing unit, configured to divide the first text according to the multiple first keywords to obtain a first text block set, where the first keyword is used And indicating a feature of the text paragraph, the first keyword is in one-to-one correspondence with the first text block in the first text block set, and the first keyword and the second of each of the second texts a first one of the text blocks; the first determining unit is configured to determine a first target similarity between the first text block and the second text block corresponding to each of the first keywords; and the second determining unit is configured The first similarity is determined according to the first preset weight corresponding to each of the first target similarities and the first target similarity.
可选地,所述第一确定模块包括:第一查找单元,被设置为在所述第一文本中查找第二关键词所在的段落,并将查找到的所述段落确定为特征段落;获取单元,被设置为在所述特征段落中获取第三关键词;第二查找单元,被设置为从关键词与文本类型的对应关系中查找所述第三关键词对应的所述目标文本类型。Optionally, the first determining module includes: a first searching unit, configured to search for a paragraph in which the second keyword is located in the first text, and determine the found paragraph as a feature paragraph; a unit, configured to obtain a third keyword in the feature paragraph; and a second search unit configured to search for the target text type corresponding to the third keyword from a correspondence between a keyword and a text type.
根据本申请的又一个实施例,还提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序运行时执行上述任一项所述的方法。According to still another embodiment of the present application, there is also provided a storage medium comprising a stored program, wherein the program is executed to perform the method of any of the above.
根据本申请的又一个实施例,还提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任一项所述的方法。According to still another embodiment of the present application, there is also provided a processor for running a program, wherein the program is executed to perform the method of any of the above.
通过本申请,确定客户端获取的第一文本的目标文本类型;在预设的文本集合中查找类型为目标文本类型的多个第二文本;确定第一文本与多个第二文本之间的第一相似度;确定第一相似度满足预设条件的第二文本 为目标文本;将目标文本推送给客户端,由此可见,采用上述方案根据获取的第一文本的目标文本类型从预设的文本集合中查找与第一文本类型相同的多个第二文本,从而保证推送的文本资源是与用户希望查找的文本同类型的文本资源,再从查找到的多个第二文本中获取与第一文本相似度较高的第二文本作为目标文本,并将目标文本推送给客户端,使得推送的客户端的文本资源与从客户端获取的第一文本具有相同的文本类型并且较为相似,提高了向客户端推送文本资源的针对性和有效性,因此,提高了文本资源的推送效率,从而解决了相关技术中文本资源的推送效率较低的问题。Through the application, determining a target text type of the first text acquired by the client; searching for a plurality of second texts of the target text type in the preset text collection; determining between the first text and the plurality of second texts a first similarity; determining that the second text whose first similarity satisfies the preset condition is the target text; and pushing the target text to the client, thereby showing that the target text type of the first text obtained is preset according to the obtained scheme Searching for a plurality of second texts of the same type as the first text, thereby ensuring that the pushed text resource is the same type of text resource as the text that the user desires to find, and then obtaining and searching from the plurality of second texts found. The second text with the higher similarity of the first text is used as the target text, and the target text is pushed to the client, so that the text resource of the pushed client has the same text type and the same as the first text obtained from the client, and is improved. The relevance and effectiveness of pushing text resources to the client, thus improving the efficiency of pushing text resources, thereby solving the phase The problem of low efficiency of pushing text resources in technology.
附图说明DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1是本申请实施例的一种文本资源的推送方法的移动终端的硬件结构框图;1 is a block diagram showing the hardware structure of a mobile terminal for pushing a text resource according to an embodiment of the present application;
图2是根据本申请实施例的一种文本资源的推送方法的流程图;2 is a flowchart of a method for pushing a text resource according to an embodiment of the present application;
图3是根据本申请可选的实施方式的确定第一相似度的示意图;3 is a schematic diagram of determining a first similarity according to an alternative embodiment of the present application;
图4是根据本申请实施例的一种文本资源的推送装置的结构框图一;4 is a structural block diagram 1 of a text resource pushing apparatus according to an embodiment of the present application;
图5是根据本申请实施例的一种文本资源的推送装置的结构框图二;FIG. 5 is a structural block diagram 2 of a text resource pushing apparatus according to an embodiment of the present application; FIG.
图6是根据本申请实施例的一种文本资源的推送装置的结构框图三。FIG. 6 is a structural block diagram 3 of a text resource pushing apparatus according to an embodiment of the present application.
具体实施方式Detailed ways
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The present application will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或 先后次序。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or order.
本申请实施例1所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本申请实施例的一种文本资源的推送方法的移动终端的硬件结构框图,如图1所示,移动终端10可以包括一个或多个(图中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输装置106。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,移动终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiment provided by Embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or the like. 1 is a block diagram of a hardware structure of a mobile terminal for pushing a text resource according to an embodiment of the present application. As shown in FIG. 1, the mobile terminal 10 may include one or more (in the figure). Only one processor 102 is shown (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. . It will be understood by those skilled in the art that the structure shown in FIG. 1 is merely illustrative and does not limit the structure of the above electronic device. For example, the mobile terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
存储器104可被设置为存储应用软件的软件程序以及模块,如本申请实施例中的文本资源的推送方法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be configured as a software program and a module for storing application software, such as program instructions/modules corresponding to the push method of the text resource in the embodiment of the present application, and the processor 102 runs the software program and the module stored in the memory 104. Thereby performing various functional applications and data processing, that is, implementing the above method. Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, memory 104 may further include memory remotely located relative to processor 102, which may be connected to mobile terminal 10 over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
传输装置106被设置为经由一个网络接收或者发送数据。上述的网络实例可包括移动终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其被设置为通过无线方式与互联网进行通讯。Transmission device 106 is arranged to receive or transmit data via a network. The network instance described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module configured to communicate with the Internet wirelessly.
在本实施例中提供了一种文本资源的推送方法,图2是根据本申请实施例的一种文本资源的推送方法的流程图,如图2所示,该流程包括如下 步骤:In this embodiment, a method for pushing a text resource is provided. FIG. 2 is a flowchart of a method for pushing a text resource according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
步骤S202,确定客户端获取的第一文本的目标文本类型;Step S202, determining a target text type of the first text acquired by the client;
步骤S204,在预设的文本集合中查找类型为目标文本类型的多个第二文本;Step S204, searching for a plurality of second texts of the target text type in the preset text set;
步骤S206,确定第一文本与多个第二文本之间的第一相似度;Step S206, determining a first similarity between the first text and the plurality of second texts;
步骤S208,确定第一相似度满足预设条件的第二文本为目标文本;Step S208, determining that the second text whose first similarity meets the preset condition is the target text;
步骤S210,将目标文本推送给客户端。In step S210, the target text is pushed to the client.
可选地,上述文本资源的推送方法可以但不限于应用于为用户推送文本资源的场景中。例如:新闻资讯应用中为用户推送文本资源的场景、文本资源阅读应用中为用户推送文本资源的场景等等。Optionally, the pushing method of the text resource may be, but is not limited to, being applied to a scenario in which a text resource is pushed for a user. For example, a scene in which a text resource is pushed for a user in a news information application, a scene in which a text resource is pushed for a user in a text resource reading application, and the like.
可选地,上述文本资源的推送方法可以但不限于应用于终端设备或者服务器设备等等,例如:终端设备可以但不限于包括:手机、平板电脑、PC计算机、智能穿戴设备、智能电子设备、智能家居设备等等。Optionally, the method for pushing the text resource may be, but is not limited to, being applied to a terminal device or a server device, etc., for example, the terminal device may include, but is not limited to, a mobile phone, a tablet computer, a PC computer, a smart wearable device, an intelligent electronic device, Smart home equipment and more.
可选地,在上述实施例中,上述客户端可以但不限于为应用程序的客户端,例如:上述应用程序可以但不限于包括:新闻资讯应用、文本资源阅读应用、即时通信应用、浏览器应用等等。Optionally, in the foregoing embodiment, the foregoing client may be, but is not limited to, a client that is an application, for example, the foregoing application may include, but is not limited to, a news information application, a text resource reading application, an instant messaging application, and a browser. Application and more.
可选地,在上述实施例中,文本资源的文本类型可以但不限于是根据文本资源涉及的领域划分的,比如:体育、娱乐、科技、财经、军事等等。文本资源的文本类型还可以但不限于是根据某类型文本资源中的分类规则进行划分的,比如:在法律文件中的裁判文书按审级划分为一审文件、二审文件等等,按案件类型划分为民事案件文件、行政案件文件;或在法律文件中的庭审笔录/裁判文书按案由划分为商标侵权纠纷案文件、生命权纠纷案文件、离婚纠纷案文件等等。Optionally, in the foregoing embodiment, the text type of the text resource may be, but is not limited to, divided according to the domain involved in the text resource, such as: sports, entertainment, technology, finance, military, and the like. The text type of the text resource can also be divided according to the classification rules in a certain type of text resource. For example, the judgment documents in the legal document are classified into the first-instance document, the second-instance document, etc. according to the trial level, and are classified according to the type of the case. It is a civil case file or an administrative case file; or the court transcript/judgment document in the legal document is divided into a trademark infringement dispute file, a life right dispute file, a divorce dispute file, and so on.
可选地,在本实施例中,预设条件可以设置为用于获取第一相似度较高的第二文本的条件。例如:第一相似度最高、第一相似度高于某预设值等等。Optionally, in this embodiment, the preset condition may be set as a condition for acquiring the second text with the first higher similarity. For example, the first similarity is the highest, the first similarity is higher than a certain preset value, and the like.
在一个可选的实施方式中,以由服务器根据当前的前置文书材料为法官智能推荐相似案件的裁判文书为例,前置文书是法院裁判案件的重要的书面材料和依据,它是法院卷宗的重要组成部分,包括:庭审笔录,起诉状,答辩状等。裁判文书记载了法院审理案件的过程和结果,是诉讼活动结果的载体。In an optional implementation manner, the predicate document is an important written material and basis for a court referee case, and the court file is used as a case for the judge to intelligently recommend a similar case for the judge according to the current pre-document material. The important components include: trial transcripts, complaints, pleadings, etc. The referee document records the process and results of the court's trial of the case and is the carrier of the outcome of the litigation activity.
前置文书中包括如下信息:法院对诉讼案件所涉及的法律关系的性质的概括(案由);原告或上诉人的诉请;被告或被上诉人的答辩;当事人双方证据的展示、辩论和质证意见等。这些信息是法官做出诉讼判决的重要参考依据。而裁判文书也包括了上述信息,除此以外,还包括法院对案件的论理,法官做出判决的适用法律以及裁判结果等。The pre-document includes the following information: the generalization of the nature of the legal relationship involved in the litigation case (the case); the appeal of the plaintiff or the appellant; the defense of the defendant or the appellee; the display, debate and cross-examination of the evidence of the parties Comments and so on. This information is an important reference for judges to make litigation decisions. The referee documents also include the above information, in addition to the court's arguments on the case, the applicable law of the judge's decision and the outcome of the referee.
服务器接收客户端获取的第一文本,该第一文本为当前待判定案件的前置文书,前置文书的文本类型可以按案由划分为商标侵权纠纷、生命权纠纷、离婚纠纷等,按照案件类型划分为民事案件、行政案件、刑事案件,服务器确定当前的前置文书的目标文本类型为商标侵权纠纷的民事案件,预设的文本集合中包括大量已判案件的裁判文书,在预设的文本集合中查找案由为商标侵权纠纷的民事案件的裁判文书作为上述多个第二文本,确定当前的前置文书与案由为商标侵权纠纷的民事案件的裁判文书之间的第一相似度,对第一相似度进行排序,将排在前10位的裁判文书确定为目标文本,并将目标文本推送给客户端。The server receives the first text obtained by the client, and the first text is a pre-document of the current pending case, and the text type of the pre-document can be classified into a trademark infringement dispute, a right to life dispute, a divorce dispute, etc. according to the case type. Divided into civil cases, administrative cases, criminal cases, the server determines that the target text type of the current pre-document is a civil case of trademark infringement disputes, and the preset text collection includes a large number of judgment documents of the judged cases, in the preset text The search case in the collection is determined by the judgment document of the civil case for the trademark infringement dispute as the plurality of second texts, and the first similarity between the current pre-document and the judgment document of the civil case in which the case is a trademark infringement dispute is determined. A similarity is sorted, and the top 10 judges are determined as the target text, and the target text is pushed to the client.
通过上述步骤,根据获取的第一文本的目标文本类型从预设的文本集合中查找与第一文本类型相同的多个第二文本,从而保证推送的文本资源是与用户希望查找的文本同类型的文本资源,再从查找到的多个第二文本中获取与第一文本相似度较高的第二文本作为目标文本,并将目标文本推送给客户端,使得推送的客户端的文本资源与从客户端获取的第一文本具有相同的文本类型并且较为相似,提高了向客户端推送文本资源的针对性和有效性,因此,提高了文本资源的推送效率,从而解决了相关技术中文本资源的推送效率较低的问题。Through the above steps, searching for a plurality of second texts of the same text type from the preset text set according to the obtained target text type of the first text, thereby ensuring that the pushed text resource is of the same type as the text that the user desires to find. a text resource, and then obtaining a second text having a higher degree of similarity to the first text as the target text from the plurality of second texts found, and pushing the target text to the client, so that the text resource of the pushed client is The first text obtained by the client has the same text type and is similar, which improves the pertinence and validity of pushing the text resource to the client, thereby improving the pushing efficiency of the text resource, thereby solving the text resource in the related art. Pushing low efficiency issues.
可选地,在确定第一相似度的过程中,可以将第一文本划分为多个第一文本块,并分别确定各个第一文本块与第二文本中的第二文本块之间的第一目标相似度,再根据各个文本块对两个文本之间相似度的影响程度对第一目标相似度进行加权运算,从而确定第一文本与每个第二文本之间的第一相似度。例如:在上述步骤S206中,根据多个第一关键词对第一文本进行划分,得到第一文本块集合,其中,第一关键词用于指示文本段落的特征,第一关键词与第一文本块集合中的第一文本块一一对应,并且第一关键词与每个第二文本中的第二文本块一一对应,并确定每个第一关键词对应的第一文本块与第二文本块之间的第一目标相似度,再根据每个第一目标相似度对应的第一预设权重与第一目标相似度确定第一相似度。Optionally, in determining the first similarity, the first text may be divided into a plurality of first text blocks, and respectively determined between the first text block and the second text block in the second text. A target similarity is further weighted according to the degree of influence of each text block on the similarity between the two texts, thereby determining a first similarity between the first text and each second text. For example, in the above step S206, the first text is divided according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of the text paragraph, the first keyword and the first The first text blocks in the set of text blocks are in one-to-one correspondence, and the first keyword is in one-to-one correspondence with the second text blocks in each second text, and the first text block corresponding to each first keyword is determined The first target similarity between the two text blocks, and determining the first similarity according to the first preset weight corresponding to each first target similarity and the first target similarity.
可选地,在本实施例中,第二文本中的第二文本块可以但不限于通过服务器对第二文本进行预处理解析获取,第二文本块可以为多个,服务器可以按照段落内容的特征将第二文本解析为多个第二文本块。例如:服务器可以对裁判文书库中的裁判文书进行段落解析,解析出如下段落:诉请段,答辩段,证据段,质证意见段,争议焦点段,审理查明段,本院认为段等等。Optionally, in this embodiment, the second text block in the second text may be, but is not limited to, being pre-processed and parsed by the server, and the second text block may be multiple, and the server may follow the paragraph content. The feature parses the second text into a plurality of second text blocks. For example, the server can parse the judgment documents in the judgment document library, and analyze the following paragraphs: the appeal section, the defense section, the evidence section, the cross-examination opinion section, the dispute focus section, the trial identification section, the court thinks that the paragraph, etc. .
在上述可选的实施方式中,服务器对第一文本中的庭审笔录进行段落解析,得到如下描述段:原告诉请段(原告陈述的起诉的事实和理由及诉讼请求);被告答辩段(被告对于原告诉请陈述的答辩意见);证据段(当事人双方对证据的展示);质证意见和辩论段(当事人双方互相的质证和辩论);法庭询问段(法庭的询问及当事人双方的回答)In the above optional implementation manner, the server performs paragraph analysis on the trial transcript in the first text, and obtains the following description segment: the original telling paragraph (the facts and reasons of the prosecution filed by the plaintiff and the lawsuit request); the defendant reply segment (the defendant For the original statement, please provide a statement of opinion; the evidence paragraph (the two parties show the evidence); the cross-examination opinion and the debate paragraph (the mutual cross-examination and debate between the parties); the court inquiry section (the court's inquiry and the parties' answers)
服务器对起诉状或上诉状进行段落解析,得到如下描述段:原告诉请段及事实理由描述。对答辩状或上诉答辩状进行段落解析,得到如下描述段:被告答辩意见。The server analyzes the paragraph of the complaint or the appeal, and obtains the following description: the original telling paragraph and the fact reason description. A paragraph analysis of the defense or appeal reply is obtained as follows: Defendant's defense.
上述第一文本块包括:起诉状(诉请段)+上诉状(诉请段)、答辩状(答辩意见段)+上诉答辩状(答辩意见段)、庭审笔录(法庭调查与辩论段)+起诉状(事实理由段)。The above first text block includes: the complaint (appeal section) + appeal (the appeal section), the reply (the defense section) + the appeal reply (the defense section), the trial transcript (the court investigation and debate section) + Complaint (de facto reason section).
图3是根据本申请可选的实施方式的确定第一相似度的示意图,如图3所示,第二文本的第二文本块包括:裁判文书(诉请段)、裁判文书(答辩意见段)、裁判文书(事实认定段+本院认为段)、裁判文书全文。服务器分别确定起诉状(诉请段)+上诉状(诉请段)与裁判文书(诉请段)的第一目标相似度为S1,答辩状(答辩意见段)+上诉答辩状(答辩意见段)与裁判文书(答辩意见段)的第一目标相似度为S2,庭审笔录(法庭调查与辩论段)+起诉状(事实理由段)与裁判文书(事实认定段+本院认为段)的第一目标相似度为S3。上述第一目标相似度对应的第一预设权重分别为W1、W2、W3。服务器再根据上述第一目标相似度对应的第一预设权重与上述第一目标相似度确定第一相似度。3 is a schematic diagram of determining a first similarity according to an optional embodiment of the present application. As shown in FIG. 3, the second text block of the second text includes: a referee document (a petition), a referee document (a reply segment) ), the judgment document (the factual recognition section + the court thinks the paragraph), the full text of the judgment document. The server determines the similarity of the first target of the complaint (appeal section) + appeal (the appeal section) and the judgment document (the appeal section) is S1, the reply (the defense paragraph) + the appeal reply (the defense paragraph) ) The similarity with the first target of the judgment document (the defense paragraph) is S2, the trial transcript (the court investigation and debate section) + the complaint (the factual reason section) and the judgment document (the fact finding section + the court thinks the paragraph) A target similarity is S3. The first preset weights corresponding to the first target similarity are W1, W2, and W3, respectively. The server determines the first similarity according to the first preset weight corresponding to the first target similarity and the first target similarity.
可选地,可以但不限于通过以下方式之一确定第一相似度:Optionally, the first similarity may be determined by one of the following methods:
方式一,根据第一预设权重确定第一目标相似度的第一加权和,将第一加权和作为第一相似度。In a first manner, the first weighted sum of the first target similarity is determined according to the first preset weight, and the first weighted sum is used as the first similarity.
例如:在上述可选的实施方式中,第一相似度P=W1*S1+W2*S2+W3*S3。For example, in the above optional embodiment, the first similarity P=W1*S1+W2*S2+W3*S3.
方式二,根据第一预设权重确定第一目标相似度的第一加权平均数,将第一加权平均数作为第一相似度。In the second manner, the first weighted average of the first target similarity is determined according to the first preset weight, and the first weighted average is used as the first similarity.
例如:在上述可选的实施方式中,第一相似度P=(W1*S1+W2*S2+W3*S3)/3。For example, in the above optional embodiment, the first similarity P=(W1*S1+W2*S2+W3*S3)/3.
方式三,确定第一文本与第二文本之间的第二目标相似度;根据第一预设权重以及第二目标相似度对应的第二预设权重确定第一目标相似度和第二目标相似度的第二加权和,将第二加权和作为第一相似度。The third method determines a second target similarity between the first text and the second text, and determines that the first target similarity and the second target are similar according to the first preset weight and the second preset weight corresponding to the second target similarity The second weighted sum of degrees, the second weighted sum is taken as the first similarity.
例如:在上述可选的实施方式中,第一文本与第二文本之间的第二目标相似度为X,该第二目标相似度对应的第二预设权重为V,则第一相似度P=W1*S1+W2*S2+W3*S3+V*X。For example, in the above optional implementation manner, the second target similarity between the first text and the second text is X, and the second predetermined weight corresponding to the second target similarity is V, and the first similarity is P=W1*S1+W2*S2+W3*S3+V*X.
方式四,确定第一文本与第二文本之间的第二目标相似度;根据第一预设权重以及第二目标相似度对应的第二预设权重确定第一目标相似度 和第二目标相似度的第二加权平均数,将第二加权平均数作为第一相似度。And determining a second target similarity between the first text and the second text, and determining, according to the first preset weight and the second preset weight corresponding to the second target similarity, that the first target similarity and the second target are similar The second weighted average of degrees, the second weighted average is taken as the first similarity.
例如:在上述可选的实施方式中,第一文本与第二文本之间的第二目标相似度为X,该第二目标相似度对应的第二预设权重为V,则第一相似度P=(W1*S1+W2*S2+W3*S3+V*X)/4。For example, in the above optional implementation manner, the second target similarity between the first text and the second text is X, and the second predetermined weight corresponding to the second target similarity is V, and the first similarity is P = (W1 * S1 + W2 * S2 + W3 * S3 + V * X) / 4.
可选地,在本实施例中,确定第一文本与第二文本之间的第二目标相似度,并将第二目标相似度对两个文本之间的第一相似度的影响考虑到第一相似度的确定过程中,可以避免筛选出部分相似度极高,但整体上却并不类似的文本。Optionally, in this embodiment, determining a second target similarity between the first text and the second text, and considering the influence of the second target similarity on the first similarity between the two texts In the process of determining the similarity, it is possible to avoid screening out texts with a high degree of similarity but not similar in general.
可选地,可以但不限于通过以下方式之一从多个第二文本中确定目标文本:Optionally, the target text may be determined from the plurality of second texts by one of the following methods:
方式1,确定第一相似度落于预设阈值范围内的第二文本为目标文本。例如:预设阈值范围设置为高于P0,则将第一相似度大于P0的第二文本作为目标文本。In the first manner, the second text whose first similarity falls within the preset threshold range is determined as the target text. For example, if the preset threshold range is set to be higher than P0, the second text whose first similarity is greater than P0 is used as the target text.
方式2,按照第一相似度由高到低对第二文本进行排序;确定排在前预设数量个的第二文本为目标文本。例如:预设数量可以为1,则将多个第二文本中第一相似度最高的文本确定为目标文本。预设数量可以为10,则从多个第二文本中筛选出第一相似度排在前十位的文本作为目标文本。In the mode 2, the second text is sorted according to the first similarity from high to low; and the second predetermined text in the preset number is determined as the target text. For example, if the preset number can be 1, the first text with the highest similarity among the plurality of second texts is determined as the target text. The preset number may be 10, and the first tenth similarity text is selected as the target text from the plurality of second texts.
可选地,在上述步骤S202中,可以但不限于通过以下方式确定目标文本类型:在第一文本中查找第二关键词所在的段落,并将查找到的段落确定为特征段落,在特征段落中获取第三关键词,从关键词与文本类型的对应关系中查找第三关键词对应的目标文本类型。Optionally, in the foregoing step S202, the target text type may be determined by: searching for the paragraph in which the second keyword is located in the first text, and determining the found paragraph as the feature paragraph, in the feature paragraph The third keyword is obtained, and the target text type corresponding to the third keyword is searched from the correspondence between the keyword and the text type.
可选地,在本实施例中,上述第二关键词可以用于指示目标文本段落的特征,第二关键词与第一文本块集合中的一个第一文本块对应,上诉第三关键词可以为用于表征文本类型的关键词。例如:上述第一文本为庭审笔录,服务器在庭审笔录中查找第二关键词“宣布开庭”所在的段落,并将该段确定为特征段落,在该段中获取到第三关键词“商标权纠纷”,服务器可以从关键词与文本类型的对应关系中查找“商标权纠纷”对应的目 标文本类型为商标权权属纠纷案件。Optionally, in this embodiment, the second keyword may be used to indicate a feature of the target text segment, the second keyword corresponds to a first text block in the first text block set, and the third keyword may be appealed. A keyword used to characterize a text type. For example, the first text mentioned above is a trial transcript, and the server searches for a paragraph in the trial transcript that the second keyword "announces the hearing" and identifies the paragraph as a feature paragraph, in which the third keyword "trademark rights" is obtained. In the dispute, the server can search for the target text type corresponding to the “trademark dispute” from the correspondence between the keyword and the text type as the trademark ownership dispute case.
在上述可选的实施方式中,可以解析前置文书中的案由,审级和案件类型,首先提取出庭审笔录中对案由的描述(在宣布开庭的段落部分)。例如:“审:现在宣布开庭。北京市xxx人民法院,今天依法适用普通程序公开审理原告xxx诉被告xxx商标权权属纠纷一案,本案由本院代理审判员xxx担任审判长,与本院代理审判员xxx、xxx依法组成合议庭,由书记员xxx担任法庭记录”。服务器从中提取出案由(商标权权属纠纷),解析出案件类型(民事或行政)。In the above optional embodiment, the case, the level of the case and the type of the case in the pre-document can be analyzed, and the description of the case in the transcript of the trial is first extracted (in the paragraph section of the announcement). For example: "Trial: Announced the opening of the court. The Beijing xxx People's Court, today applies the ordinary procedure in accordance with the law to publicly hear the plaintiff xxx v. the defendant xx trademark ownership dispute case, the case is judged by the court's acting judge xxx as the presiding judge, and the court The acting judges xxx, xxx form a collegiate panel according to law, and the clerk xxx serves as the court record." The server extracts the case (trademark ownership dispute) and resolves the type of case (civil or administrative).
在本实施例中还提供了一种文本资源的推送装置,该装置用于实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In the embodiment, a text resource pushing device is further provided, which is used to implement the above-mentioned embodiments and optional embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图4是根据本申请实施例的一种文本资源的推送装置的结构框图一,如图4所示,该装置包括:FIG. 4 is a structural block diagram 1 of a text resource pushing apparatus according to an embodiment of the present application. As shown in FIG. 4, the apparatus includes:
第一确定模块402,被设置为确定客户端获取的第一文本的目标文本类型;The first determining module 402 is configured to determine a target text type of the first text acquired by the client;
查找模块404,耦合至第一确定模块402,被设置为在预设的文本集合中查找类型为目标文本类型的多个第二文本;The searching module 404 is coupled to the first determining module 402, and configured to search, in the preset text set, a plurality of second texts of a type of the target text type;
第二确定模块406,耦合至查找模块404,被设置为确定第一文本与多个第二文本之间的第一相似度;a second determining module 406, coupled to the lookup module 404, configured to determine a first similarity between the first text and the plurality of second texts;
第三确定模块408,耦合至第二确定模块406,被设置为确定第一相似度满足预设条件的第二文本为目标文本;The third determining module 408 is coupled to the second determining module 406, and is configured to determine that the second text whose first similarity meets the preset condition is the target text;
推送模块410,耦合至第三确定模块408,被设置为将目标文本推送给客户端。The push module 410, coupled to the third determination module 408, is configured to push the target text to the client.
可选地,上述文本资源的推送装置可以但不限于应用于为用户推送文本资源的场景中。例如:新闻资讯应用中为用户推送文本资源的场景、文 本资源阅读应用中为用户推送文本资源的场景等等。Optionally, the pushing device of the text resource may be, but is not limited to, applied to a scenario in which a text resource is pushed for a user. For example, a scene in which a text resource is pushed for a user in a news information application, a scene in which a text resource is pushed for a user in a text resource reading application, and the like.
可选地,上述文本资源的推送装置可以但不限于应用于终端设备或者服务器设备等等,例如:终端设备可以但不限于包括:手机、平板电脑、PC计算机、智能穿戴设备、智能电子设备、智能家居设备等等。Optionally, the pushing device of the text resource may be, but is not limited to, being applied to a terminal device or a server device, etc., for example, the terminal device may include, but is not limited to, a mobile phone, a tablet computer, a PC computer, a smart wearable device, an intelligent electronic device, Smart home equipment and more.
可选地,在上述实施例中,上述客户端可以但不限于为应用程序的客户端,例如:上述应用程序可以但不限于包络:新闻资讯应用、文本资源阅读应用、即时通信应用、浏览器应用等等。Optionally, in the foregoing embodiment, the client may be, but is not limited to, a client that is an application, for example, the application may be, but not limited to, an envelope: a news information application, a text resource reading application, an instant messaging application, and browsing. Applications and more.
可选地,在上述实施例中,文本资源的文本类型可以但不限于是根据文本资源涉及的领域划分的,比如:体育、娱乐、科技、财经、军事等等。文本资源的文本类型还可以但不限于是根据某类型文本资源中的分类规则进行划分的,比如:在法律文件中的裁判文书按审级划分为一审文件、二审文件等等,按案件类型划分为民事案件文件、行政案件文件;或在法律文件中的庭审笔录/裁判文书按案由划分为商标侵权纠纷案文件、生命权纠纷案文件、离婚纠纷案文件等等。Optionally, in the foregoing embodiment, the text type of the text resource may be, but is not limited to, divided according to the domain involved in the text resource, such as: sports, entertainment, technology, finance, military, and the like. The text type of the text resource can also be divided according to the classification rules in a certain type of text resource. For example, the judgment documents in the legal document are classified into the first-instance document, the second-instance document, etc. according to the trial level, and are classified according to the type of the case. It is a civil case file or an administrative case file; or the court transcript/judgment document in the legal document is divided into a trademark infringement dispute file, a life right dispute file, a divorce dispute file, and so on.
可选地,在本实施例中,预设条件可以设置为用于获取第一相似度较高的第二文本的条件。例如:第一相似度最高、第一相似度高于某预设值等等。Optionally, in this embodiment, the preset condition may be set as a condition for acquiring the second text with the first higher similarity. For example, the first similarity is the highest, the first similarity is higher than a certain preset value, and the like.
在一个可选的实施方式中,以由服务器根据当前的前置文书材料为法官智能推荐相似案件的裁判文书为例,前置文书是法院裁判案件的重要的书面材料和依据,它是法院卷宗的重要组成部分,包括:庭审笔录,起诉状,答辩状等。裁判文书记载了法院审理案件的过程和结果,是诉讼活动结果的载体。In an optional implementation manner, the predicate document is an important written material and basis for a court referee case, and the court file is used as a case for the judge to intelligently recommend a similar case for the judge according to the current pre-document material. The important components include: trial transcripts, complaints, pleadings, etc. The referee document records the process and results of the court's trial of the case and is the carrier of the outcome of the litigation activity.
前置文书中包括如下信息:法院对诉讼案件所涉及的法律关系的性质的概括(案由);原告或上诉人的诉请;被告或被上诉人的答辩;当事人双方证据的展示、辩论和质证意见等。这些信息是法官做出诉讼判决的重要参考依据。而裁判文书也包括了上述信息,除此以外,还包括法院对案件的论理,法官做出判决的适用法律以及裁判结果等。The pre-document includes the following information: the generalization of the nature of the legal relationship involved in the litigation case (the case); the appeal of the plaintiff or the appellant; the defense of the defendant or the appellee; the display, debate and cross-examination of the evidence of the parties Comments and so on. This information is an important reference for judges to make litigation decisions. The referee documents also include the above information, in addition to the court's arguments on the case, the applicable law of the judge's decision and the outcome of the referee.
服务器接收客户端获取的第一文本,该第一文本为当前待判定案件的前置文书,前置文书的文本类型可以按案由划分为商标侵权纠纷、生命权纠纷、离婚纠纷等,按照案件类型划分为民事案件、行政案件、刑事案件,服务器确定当前的前置文书的目标文本类型为商标侵权纠纷的民事案件,预设的文本集合中包括大量已判案件的裁判文书,在预设的文本集合中查找案由为商标侵权纠纷的民事案件的裁判文书作为上述多个第二文本,确定当前的前置文书与案由为商标侵权纠纷的民事案件的裁判文书之间的第一相似度,对第一相似度进行排序,将排在前10位的裁判文书确定为目标文本,并将目标文本推送给客户端。The server receives the first text obtained by the client, and the first text is a pre-document of the current pending case, and the text type of the pre-document can be classified into a trademark infringement dispute, a right to life dispute, a divorce dispute, etc. according to the case type. Divided into civil cases, administrative cases, criminal cases, the server determines that the target text type of the current pre-document is a civil case of trademark infringement disputes, and the preset text collection includes a large number of judgment documents of the judged cases, in the preset text The search case in the collection is determined by the judgment document of the civil case for the trademark infringement dispute as the plurality of second texts, and the first similarity between the current pre-document and the judgment document of the civil case in which the case is a trademark infringement dispute is determined. A similarity is sorted, and the top 10 judges are determined as the target text, and the target text is pushed to the client.
通过上述装置,根据获取的第一文本的目标文本类型从预设的文本集合中查找与第一文本类型相同的多个第二文本,从而保证推送的文本资源是与用户希望查找的文本同类型的文本资源,再从查找到的多个第二文本中获取与第一文本相似度较高的第二文本作为目标文本,并将目标文本推送给客户端,使得推送的客户端的文本资源与从客户端获取的第一文本具有相同的文本类型并且较为相似,提高了向客户端推送文本资源的针对性和有效性,因此,提高了文本资源的推送效率,从而解决了相关技术中文本资源的推送效率较低的问题。Through the above device, searching for a plurality of second texts of the same text type from the preset text set according to the acquired target text type of the first text, thereby ensuring that the pushed text resource is of the same type as the text that the user desires to find. a text resource, and then obtaining a second text having a higher degree of similarity to the first text as the target text from the plurality of second texts found, and pushing the target text to the client, so that the text resource of the pushed client is The first text obtained by the client has the same text type and is similar, which improves the pertinence and validity of pushing the text resource to the client, thereby improving the pushing efficiency of the text resource, thereby solving the text resource in the related art. Pushing low efficiency issues.
图5是根据本申请实施例的一种文本资源的推送装置的结构框图二,如图5所示,可选地,第二确定模块406包括:FIG. 5 is a block diagram of a structure of a text resource pushing apparatus according to an embodiment of the present application. As shown in FIG. 5, optionally, the second determining module 406 includes:
划分单元52,被设置为根据多个第一关键词对第一文本进行划分,得到第一文本块集合,其中,第一关键词用于指示文本段落的特征,第一关键词与第一文本块集合中的第一文本块一一对应,并且第一关键词与每个第二文本中的第二文本块一一对应;The dividing unit 52 is configured to divide the first text according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of the text paragraph, the first keyword and the first text The first text blocks in the block set are in one-to-one correspondence, and the first keyword is in one-to-one correspondence with the second text block in each second text;
第一确定单元54,耦合至划分单元52,被设置为确定每个第一关键词对应的第一文本块与第二文本块之间的第一目标相似度;The first determining unit 54, coupled to the dividing unit 52, is configured to determine a first target similarity between the first text block and the second text block corresponding to each first keyword;
第二确定单元56,耦合至第一确定单元54,被设置为根据每个第一目标相似度对应的第一预设权重与第一目标相似度确定第一相似度。The second determining unit 56 is coupled to the first determining unit 54 and configured to determine the first similarity according to the first target weight corresponding to each first target similarity and the first target similarity.
可选地,在确定第一相似度的过程中,可以将第一文本划分为多个第一文本块,并分别确定各个第一文本块与第二文本中的第二文本块之间的第一目标相似度,再根据各个文本块对两个文本之间相似度的影响程度对第一目标相似度进行加权运算,从而确定第一文本与每个第二文本之间的第一相似度。Optionally, in determining the first similarity, the first text may be divided into a plurality of first text blocks, and respectively determined between the first text block and the second text block in the second text. A target similarity is further weighted according to the degree of influence of each text block on the similarity between the two texts, thereby determining a first similarity between the first text and each second text.
可选地,在本实施例中,第二文本中的第二文本块可以但不限于通过服务器对第二文本进行预处理解析获取,第二文本块可以为多个,服务器可以按照段落内容的特征将第二文本解析为多个第二文本块。例如:服务器可以对裁判文书库中的裁判文书进行段落解析,解析出如下段落:诉请段,答辩段,证据段,质证意见段,争议焦点段,审理查明段,本院认为段等等。Optionally, in this embodiment, the second text block in the second text may be, but is not limited to, being pre-processed and parsed by the server, and the second text block may be multiple, and the server may follow the paragraph content. The feature parses the second text into a plurality of second text blocks. For example, the server can parse the judgment documents in the judgment document library, and analyze the following paragraphs: the appeal section, the defense section, the evidence section, the cross-examination opinion section, the dispute focus section, the trial identification section, the court thinks that the paragraph, etc. .
在上述可选的实施方式中,服务器对第一文本中的庭审笔录进行段落解析,得到如下描述段:原告诉请段(原告陈述的起诉的事实和理由及诉讼请求);被告答辩段(被告对于原告诉请陈述的答辩意见);证据段(当事人双方对证据的展示);质证意见和辩论段(当事人双方互相的质证和辩论);法庭询问段(法庭的询问及当事人双方的回答)In the above optional implementation manner, the server performs paragraph analysis on the trial transcript in the first text, and obtains the following description segment: the original telling paragraph (the facts and reasons of the prosecution filed by the plaintiff and the lawsuit request); the defendant reply segment (the defendant For the original statement, please provide a statement of opinion; the evidence paragraph (the two parties show the evidence); the cross-examination opinion and the debate paragraph (the mutual cross-examination and debate between the parties); the court inquiry section (the court's inquiry and the parties' answers)
服务器对起诉状或上诉状进行段落解析,得到如下描述段:原告诉请段及事实理由描述。对答辩状或上诉答辩状进行段落解析,得到如下描述段:被告答辩意见。The server analyzes the paragraph of the complaint or the appeal, and obtains the following description: the original telling paragraph and the fact reason description. A paragraph analysis of the defense or appeal reply is obtained as follows: Defendant's defense.
上述第一文本块包括:起诉状(诉请段)+上诉状(诉请段)、答辩状(答辩意见段)+上诉答辩状(答辩意见段)、庭审笔录(法庭调查与辩论段)+起诉状(事实理由段)。The above first text block includes: the complaint (appeal section) + appeal (the appeal section), the reply (the defense section) + the appeal reply (the defense section), the trial transcript (the court investigation and debate section) + Complaint (de facto reason section).
图3是根据本申请可选的实施方式的确定第一相似度的示意图,如图3所示,第二文本的第二文本块包括:裁判文书(诉请段)、裁判文书(答辩意见段)、裁判文书(事实认定段+本院认为段)、裁判文书全文。服务器分别确定起诉状(诉请段)+上诉状(诉请段)与裁判文书(诉请段)的第一目标相似度为S1,答辩状(答辩意见段)+上诉答辩状(答辩意见 段)与裁判文书(答辩意见段)的第一目标相似度为S2,庭审笔录(法庭调查与辩论段)+起诉状(事实理由段)与裁判文书(事实认定段+本院认为段)的第一目标相似度为S3。上述第一目标相似度对应的第一预设权重分别为W1、W2、W3。服务器再根据上述第一目标相似度对应的第一预设权重与上述第一目标相似度确定第一相似度。3 is a schematic diagram of determining a first similarity according to an optional embodiment of the present application. As shown in FIG. 3, the second text block of the second text includes: a referee document (a petition), a referee document (a reply segment) ), the judgment document (the factual recognition section + the court thinks the paragraph), the full text of the judgment document. The server determines the similarity of the first target of the complaint (appeal section) + appeal (the appeal section) and the judgment document (the appeal section) is S1, the reply (the defense paragraph) + the appeal reply (the defense paragraph) ) The similarity with the first target of the judgment document (the defense paragraph) is S2, the trial transcript (the court investigation and debate section) + the complaint (the factual reason section) and the judgment document (the fact finding section + the court thinks the paragraph) A target similarity is S3. The first preset weights corresponding to the first target similarity are W1, W2, and W3, respectively. The server determines the first similarity according to the first preset weight corresponding to the first target similarity and the first target similarity.
可选地,第二确定单元56被设置为以下之一:Alternatively, the second determining unit 56 is set to one of the following:
根据第一预设权重确定第一目标相似度的第一加权和,将第一加权和作为第一相似度;例如:在上述可选的实施方式中,第一相似度P=W1*S1+W2*S2+W3*S3。Determining, according to the first preset weight, a first weighted sum of the first target similarity, and using the first weighted sum as the first similarity; for example, in the foregoing optional implementation manner, the first similarity P=W1*S1+ W2*S2+W3*S3.
根据第一预设权重确定第一目标相似度的第一加权平均数,将第一加权平均数作为第一相似度;例如:在上述可选的实施方式中,第一相似度P=(W1*S1+W2*S2+W3*S3)/3。Determining, by the first preset weight, a first weighted average of the first target similarity, and using the first weighted average as the first similarity; for example, in the foregoing optional embodiment, the first similarity P=(W1 *S1+W2*S2+W3*S3)/3.
确定第一文本与第二文本之间的第二目标相似度;根据第一预设权重以及第二目标相似度对应的第二预设权重确定第一目标相似度和第二目标相似度的第二加权和,将第二加权和作为第一相似度;例如:在上述可选的实施方式中,第一文本与第二文本之间的第二目标相似度为X,该第二目标相似度对应的第二预设权重为V,则第一相似度P=W1*S1+W2*S2+W3*S3+V*X。Determining a second target similarity between the first text and the second text; determining a first target similarity and a second target similarity according to the first preset weight and the second preset weight corresponding to the second target similarity a second weighted sum, the second weighted sum is taken as the first similarity; for example, in the above optional embodiment, the second target similarity between the first text and the second text is X, the second target similarity The corresponding second preset weight is V, and the first similarity P=W1*S1+W2*S2+W3*S3+V*X.
确定第一文本与第二文本之间的第二目标相似度;根据第一预设权重以及第二目标相似度对应的第二预设权重确定第一目标相似度和第二目标相似度的第二加权平均数,将第二加权平均数作为第一相似度。例如:在上述可选的实施方式中,第一文本与第二文本之间的第二目标相似度为X,该第二目标相似度对应的第二预设权重为V,则第一相似度P=(W1*S1+W2*S2+W3*S3+V*X)/4。Determining a second target similarity between the first text and the second text; determining a first target similarity and a second target similarity according to the first preset weight and the second preset weight corresponding to the second target similarity The two weighted averages take the second weighted average as the first similarity. For example, in the above optional implementation manner, the second target similarity between the first text and the second text is X, and the second predetermined weight corresponding to the second target similarity is V, and the first similarity is P = (W1 * S1 + W2 * S2 + W3 * S3 + V * X) / 4.
可选地,在本实施例中,确定第一文本与第二文本之间的第二目标相似度,并将第二目标相似度对两个文本之间的第一相似度的影响考虑到第一相似度的确定过程中,可以避免筛选出部分相似度极高,但整体上却并 不类似的文本。Optionally, in this embodiment, determining a second target similarity between the first text and the second text, and considering the influence of the second target similarity on the first similarity between the two texts In the process of determining the similarity, it is possible to avoid screening out texts with a high degree of similarity but not similar in general.
可选地,第三确定模块408被设置为以下之一:Optionally, the third determining module 408 is set to one of the following:
确定第一相似度落于预设阈值范围内的第二文本为目标文本;例如:预设阈值范围设置为高于P0,则将第一相似度大于P0的第二文本作为目标文本。Determining that the second text whose first similarity falls within the preset threshold range is the target text; for example, if the preset threshold range is set to be higher than P0, the second text whose first similarity is greater than P0 is used as the target text.
按照第一相似度由高到低对第二文本进行排序;确定排在前预设数量个的第二文本为目标文本。例如:预设数量可以为1,则将多个第二文本中第一相似度最高的文本确定为目标文本。预设数量可以为10,则从多个第二文本中筛选出第一相似度排在前十位的文本作为目标文本。Sorting the second text from high to low according to the first similarity; determining that the second predetermined text is the target text. For example, if the preset number can be 1, the first text with the highest similarity among the plurality of second texts is determined as the target text. The preset number may be 10, and the first tenth similarity text is selected as the target text from the plurality of second texts.
图6是根据本申请实施例的一种文本资源的推送装置的结构框图三,如图6所示,可选地,第一确定模块402包括:FIG. 6 is a block diagram 3 of a structure of a text resource pushing apparatus according to an embodiment of the present application. As shown in FIG. 6 , optionally, the first determining module 402 includes:
第一查找单元62,被设置为在第一文本中查找第二关键词所在的段落,并将查找到的段落确定为特征段落;The first searching unit 62 is configured to search for a paragraph in which the second keyword is located in the first text, and determine the found paragraph as a feature paragraph;
获取单元64,耦合至第一查找单元62,被设置为在特征段落中获取第三关键词;An obtaining unit 64, coupled to the first searching unit 62, configured to acquire a third keyword in the feature paragraph;
第二查找单元66,耦合至获取单元64,被设置为从关键词与文本类型的对应关系中查找第三关键词对应的目标文本类型。The second searching unit 66 is coupled to the obtaining unit 64 and configured to search for the target text type corresponding to the third keyword from the correspondence between the keyword and the text type.
可选地,在本实施例中,上述第二关键词可以用于指示目标文本段落的特征,第二关键词与第一文本块集合中的一个第一文本块对应,上诉第三关键词可以为用于表征文本类型的关键词。Optionally, in this embodiment, the second keyword may be used to indicate a feature of the target text segment, the second keyword corresponds to a first text block in the first text block set, and the third keyword may be appealed. A keyword used to characterize a text type.
例如:上述第一文本为庭审笔录,服务器在庭审笔录中查找第二关键词“宣布开庭”所在的段落,并将该段确定为特征段落,在该段中获取到第三关键词“商标权纠纷”,服务器可以从关键词与文本类型的对应关系中查找“商标权纠纷”对应的目标文本类型为商标权权属纠纷案件。For example, the first text mentioned above is a trial transcript, and the server searches for a paragraph in the trial transcript that the second keyword "announces the hearing" and identifies the paragraph as a feature paragraph, in which the third keyword "trademark rights" is obtained. In the dispute, the server can search for the target text type corresponding to the “trademark dispute” from the correspondence between the keyword and the text type as the trademark ownership dispute case.
在上述可选的实施方式中,可以解析前置文书中的案由,审级和案件类型,首先提取出庭审笔录中对案由的描述(在宣布开庭的段落部分)。 例如:“审:现在宣布开庭。北京市xxx人民法院,今天依法适用普通程序公开审理原告xxx诉被告xxx商标权权属纠纷一案,本案由本院代理审判员xxx担任审判长,与本院代理审判员xxx、xxx依法组成合议庭,由书记员xxx担任法庭记录”。服务器从中提取出案由(商标权权属纠纷),解析出案件类型(民事或行政)。In the above optional embodiment, the case, the level of the case and the type of the case in the pre-document can be analyzed, and the description of the case in the transcript of the trial is first extracted (in the paragraph section of the announcement). For example: "Trial: Announced the opening of the court. The Beijing xxx People's Court, today applies the ordinary procedure in accordance with the law to publicly hear the plaintiff xxx v. the defendant xx trademark ownership dispute case, the case is judged by the court's acting judge xxx as the presiding judge, and the court The acting judges xxx, xxx form a collegiate panel according to law, and the clerk xxx serves as the court record." The server extracts the case (trademark ownership dispute) and resolves the type of case (civil or administrative).
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
以上实施例仅用以说明本申请的技术方案而非对其进行限制,本领域的普通技术人员可以对本申请的技术方案进行修改或者等同替换,而不脱离本申请的精神和范围,本申请的保护范围应以权利要求所述为准。The above embodiments are only used to describe the technical solutions of the present application, and the technical solutions of the present application may be modified or equivalently replaced by those skilled in the art without departing from the spirit and scope of the present application. The scope of protection shall be as stated in the claims.
本申请的实施例还提供了一种存储介质,该存储介质包括存储的程序,其中,上述程序运行时执行上述任一项所述的方法。The embodiment of the present application further provides a storage medium including a stored program, wherein the program runs to perform the method described in any of the above.
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:
S1,确定客户端获取的第一文本的目标文本类型;S1, determining a target text type of the first text acquired by the client;
S2,在预设的文本集合中查找类型为目标文本类型的多个第二文本;S2, searching for a plurality of second texts of the target text type in the preset text collection;
S3,确定第一文本与多个第二文本之间的第一相似度;S3. Determine a first similarity between the first text and the plurality of second texts.
S4,确定第一相似度满足预设条件的第二文本为目标文本;S4, determining that the second text whose first similarity meets the preset condition is the target text;
S5,将目标文本推送给客户端。S5, the target text is pushed to the client.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.
本申请的实施例还提供了一种处理器,该处理器用于运行程序,其中, 该程序运行时执行上述任一项方法中的步骤。Embodiments of the present application also provide a processor for running a program, wherein the program is executed to perform the steps of any of the above methods.
可选地,在本实施例中,上述程序用于执行以下步骤:Optionally, in this embodiment, the foregoing program is used to perform the following steps:
S1,确定客户端获取的第一文本的目标文本类型;S1, determining a target text type of the first text acquired by the client;
S2,在预设的文本集合中查找类型为目标文本类型的多个第二文本;S2, searching for a plurality of second texts of the target text type in the preset text collection;
S3,确定第一文本与多个第二文本之间的第一相似度;S3. Determine a first similarity between the first text and the plurality of second texts.
S4,确定第一相似度满足预设条件的第二文本为目标文本;S4, determining that the second text whose first similarity meets the preset condition is the target text;
S5,将目标文本推送给客户端。S5, the target text is pushed to the client.
可选地,本实施例中的可选示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。For an alternative example in this embodiment, reference may be made to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above modules or steps of the present application can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the application is not limited to any particular combination of hardware and software.
以上所述仅为本申请的可选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description is only an optional embodiment of the present application, and is not intended to limit the present application, and various changes and modifications may be made to the present application. Any modifications, equivalent substitutions, improvements, etc. made within the principles of this application are intended to be included within the scope of the present application.
工业实用性:Industrial applicability:
通过上述描述可知,本申请根据获取的第一文本的目标文本类型从预设的文本集合中查找与第一文本类型相同的多个第二文本,从而保证推送的文本资源是与用户希望查找的文本同类型的文本资源,再从查找到的多 个第二文本中获取与第一文本相似度较高的第二文本作为目标文本,并将目标文本推送给客户端,使得推送的客户端的文本资源与从客户端获取的第一文本具有相同的文本类型并且较为相似,提高了向客户端推送文本资源的针对性和有效性,因此,提高了文本资源的推送效率,从而解决了相关技术中文本资源的推送效率较低的问题。According to the above description, the application searches for a plurality of second texts of the same text type from the preset text set according to the obtained target text type of the first text, so as to ensure that the pushed text resource is desired by the user. a text resource of the same type as the text, and then obtaining a second text having a higher similarity with the first text as the target text from the plurality of second texts found, and pushing the target text to the client, so that the text of the pushed client is The resource has the same text type and similarity as the first text obtained from the client, which improves the pertinence and effectiveness of pushing the text resource to the client, thereby improving the pushing efficiency of the text resource, thereby solving the related technology Chinese The problem of low efficiency of pushing this resource.

Claims (10)

  1. 一种文本资源的推送方法,包括:A method for pushing text resources, including:
    确定客户端获取的第一文本的目标文本类型;Determining the target text type of the first text obtained by the client;
    在预设的文本集合中查找类型为所述目标文本类型的多个第二文本;Finding a plurality of second texts of the type of the target text in a preset text collection;
    确定所述第一文本与所述多个第二文本之间的第一相似度;Determining a first similarity between the first text and the plurality of second texts;
    确定所述第一相似度满足预设条件的第二文本为目标文本;Determining that the second text whose first similarity satisfies the preset condition is the target text;
    将所述目标文本推送给所述客户端。Pushing the target text to the client.
  2. 根据权利要求1所述的方法,其中,确定所述第一文本与所述多个第二文本之间的所述第一相似度包括:The method of claim 1, wherein determining the first similarity between the first text and the plurality of second texts comprises:
    根据多个第一关键词对所述第一文本进行划分,得到第一文本块集合,其中,所述第一关键词用于指示文本段落的特征,所述第一关键词与所述第一文本块集合中的第一文本块一一对应,并且所述第一关键词与每个所述第二文本中的第二文本块一一对应;And dividing the first text according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of a text segment, the first keyword and the first keyword The first text blocks in the set of text blocks are in one-to-one correspondence, and the first keywords are in one-to-one correspondence with the second text blocks in each of the second texts;
    确定每个所述第一关键词对应的第一文本块与第二文本块之间的第一目标相似度;Determining a first target similarity between the first text block and the second text block corresponding to each of the first keywords;
    根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度。The first similarity is determined according to the first preset weight corresponding to each of the first target similarities and the first target similarity.
  3. 根据权利要求2所述的方法,其中,根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度包括以下之一:The method according to claim 2, wherein determining the first similarity according to the first preset weight corresponding to each of the first target similarities and the first target similarity comprises one of the following:
    根据所述第一预设权重确定所述第一目标相似度的第一加权和,将所述第一加权和作为所述第一相似度;Determining, according to the first preset weight, a first weighted sum of the first target similarity, using the first weighted sum as the first similarity;
    根据所述第一预设权重确定所述第一目标相似度的第一加权平均数,将所述第一加权平均数作为所述第一相似度;Determining, according to the first preset weight, a first weighted average of the first target similarity, and using the first weighted average as the first similarity;
    确定所述第一文本与所述第二文本之间的第二目标相似度;根据 所述第一预设权重以及所述第二目标相似度对应的第二预设权重确定所述第一目标相似度和所述第二目标相似度的第二加权和,将所述第二加权和作为所述第一相似度;Determining a second target similarity between the first text and the second text; determining the first target according to the first preset weight and the second preset weight corresponding to the second target similarity a second weighted sum of the similarity and the second target similarity, the second weighted sum being the first similarity;
    确定所述第一文本与所述第二文本之间的第二目标相似度;根据所述第一预设权重以及所述第二目标相似度对应的第二预设权重确定所述第一目标相似度和所述第二目标相似度的第二加权平均数,将所述第二加权平均数作为所述第一相似度。Determining a second target similarity between the first text and the second text; determining the first target according to the first preset weight and the second preset weight corresponding to the second target similarity a second weighted average of the similarity and the second target similarity, and the second weighted average is used as the first similarity.
  4. 根据权利要求1所述的方法,其中,确定所述第一相似度满足预设条件的第二文本为目标文本包括以下之一:The method according to claim 1, wherein the determining that the second text whose first similarity satisfies the preset condition is the target text comprises one of the following:
    确定所述第一相似度落于预设阈值范围内的所述第二文本为所述目标文本;Determining that the second text whose first similarity falls within a preset threshold range is the target text;
    按照所述第一相似度由高到低对所述第二文本进行排序;确定排在前预设数量个的所述第二文本为所述目标文本。Sorting the second text according to the first similarity from high to low; determining that the second predetermined number of the second texts are the target text.
  5. 根据权利要求1至4中任一项所述的方法,其中,确定所述客户端获取的所述第一文本的所述目标文本类型包括:The method according to any one of claims 1 to 4, wherein determining the target text type of the first text acquired by the client comprises:
    在所述第一文本中查找第二关键词所在的段落,并将查找到的所述段落确定为特征段落;Finding a paragraph in which the second keyword is located in the first text, and determining the found paragraph as a feature paragraph;
    在所述特征段落中获取第三关键词;Obtaining a third keyword in the feature paragraph;
    从关键词与文本类型的对应关系中查找所述第三关键词对应的所述目标文本类型。Searching for the target text type corresponding to the third keyword from a correspondence between a keyword and a text type.
  6. 一种文本资源的推送装置,包括:A push device for text resources, comprising:
    第一确定模块,被设置为确定客户端获取的第一文本的目标文本类型;a first determining module, configured to determine a target text type of the first text acquired by the client;
    查找模块,被设置为在预设的文本集合中查找类型为所述目标文本类型的多个第二文本;a lookup module configured to look up a plurality of second texts of the target text type in a preset text collection;
    第二确定模块,被设置为确定所述第一文本与所述多个第二文本之间的第一相似度;a second determining module, configured to determine a first similarity between the first text and the plurality of second texts;
    第三确定模块,被设置为确定所述第一相似度满足预设条件的第二文本为目标文本;a third determining module, configured to determine that the second text whose first similarity meets the preset condition is the target text;
    推送模块,被设置为将所述目标文本推送给所述客户端。A push module is arranged to push the target text to the client.
  7. 根据权利要求6所述的装置,其中,所述第二确定模块包括:The apparatus of claim 6, wherein the second determining module comprises:
    划分单元,被设置为根据多个第一关键词对所述第一文本进行划分,得到第一文本块集合,其中,所述第一关键词用于指示文本段落的特征,所述第一关键词与所述第一文本块集合中的第一文本块一一对应,并且所述第一关键词与每个所述第二文本中的第二文本块一一对应;a dividing unit, configured to divide the first text according to the plurality of first keywords to obtain a first text block set, wherein the first keyword is used to indicate a feature of a text segment, the first key The words are in one-to-one correspondence with the first text block in the first set of text blocks, and the first keyword is in one-to-one correspondence with the second text block in each of the second texts;
    第一确定单元,被设置为确定每个所述第一关键词对应的第一文本块与第二文本块之间的第一目标相似度;a first determining unit, configured to determine a first target similarity between the first text block and the second text block corresponding to each of the first keywords;
    第二确定单元,被设置为根据每个所述第一目标相似度对应的第一预设权重与所述第一目标相似度确定所述第一相似度。The second determining unit is configured to determine the first similarity according to the first preset weight corresponding to each of the first target similarities and the first target similarity.
  8. 根据权利要求6或7所述的装置,其中,所述第一确定模块包括:The apparatus according to claim 6 or 7, wherein the first determining module comprises:
    第一查找单元,被设置为在所述第一文本中查找第二关键词所在的段落,并将查找到的所述段落确定为特征段落;a first searching unit, configured to search for a paragraph in which the second keyword is located in the first text, and determine the found paragraph as a feature paragraph;
    获取单元,被设置为在所述特征段落中获取第三关键词;An obtaining unit, configured to obtain a third keyword in the feature paragraph;
    第二查找单元,被设置为从关键词与文本类型的对应关系中查找所述第三关键词对应的所述目标文本类型。The second searching unit is configured to search for the target text type corresponding to the third keyword from a correspondence between the keyword and the text type.
  9. 一种存储介质,所述存储介质包括存储的程序,其中,所述程序运行时执行权利要求1至5中任一项所述的方法。A storage medium, the storage medium comprising a stored program, wherein the program is executed to perform the method of any one of claims 1 to 5.
  10. 一种处理器,所述处理器用于运行程序,其中,所述程序运 行时执行权利要求1至5中任一项所述的方法。A processor for running a program, wherein the program is executed to perform the method of any one of claims 1 to 5.
PCT/CN2018/112379 2017-10-31 2018-10-29 Text resource push method and apparatus, storage medium, and processor WO2019085856A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711053298.2 2017-10-31
CN201711053298.2A CN109729126A (en) 2017-10-31 2017-10-31 Method for pushing, device, storage medium and the processor of textual resources

Publications (1)

Publication Number Publication Date
WO2019085856A1 true WO2019085856A1 (en) 2019-05-09

Family

ID=66293364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/112379 WO2019085856A1 (en) 2017-10-31 2018-10-29 Text resource push method and apparatus, storage medium, and processor

Country Status (2)

Country Link
CN (1) CN109729126A (en)
WO (1) WO2019085856A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784034A (en) * 2019-11-01 2021-05-11 阿里巴巴集团控股有限公司 Abstract generation method and device and computer equipment
CN112989820A (en) * 2021-03-22 2021-06-18 平安国际智慧城市科技股份有限公司 Legal document positioning method, device, equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532359A (en) * 2019-06-14 2019-12-03 平安科技(深圳)有限公司 Legal provision query method, apparatus, computer equipment and storage medium
CN110362592B (en) * 2019-06-17 2023-06-23 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for pushing arbitration guide information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055581A (en) * 2006-04-13 2007-10-17 Lg电子株式会社 Document management system and method
US8255405B2 (en) * 2009-01-30 2012-08-28 Hewlett-Packard Development Company, L.P. Term extraction from service description documents
CN103631769A (en) * 2012-08-23 2014-03-12 北京百度网讯科技有限公司 Method and device for judging consistency between file content and title
CN103838735A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Data retrieval method for improving retrieval efficiency and quality
CN104298704A (en) * 2014-08-06 2015-01-21 南京奇幻通信科技有限公司 Method and system for pushing text on blog

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140359476A1 (en) * 2013-05-30 2014-12-04 Kabam, Inc. System and method for forwarding external notifications of events in a virtual space from a presentation control device to a user device
CN106294502B (en) * 2015-06-09 2020-06-23 北京搜狗科技发展有限公司 Electronic book information processing method and device
CN107273391A (en) * 2016-04-08 2017-10-20 北京国双科技有限公司 Document recommends method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055581A (en) * 2006-04-13 2007-10-17 Lg电子株式会社 Document management system and method
US8255405B2 (en) * 2009-01-30 2012-08-28 Hewlett-Packard Development Company, L.P. Term extraction from service description documents
CN103631769A (en) * 2012-08-23 2014-03-12 北京百度网讯科技有限公司 Method and device for judging consistency between file content and title
CN103838735A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Data retrieval method for improving retrieval efficiency and quality
CN104298704A (en) * 2014-08-06 2015-01-21 南京奇幻通信科技有限公司 Method and system for pushing text on blog

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784034A (en) * 2019-11-01 2021-05-11 阿里巴巴集团控股有限公司 Abstract generation method and device and computer equipment
CN112989820A (en) * 2021-03-22 2021-06-18 平安国际智慧城市科技股份有限公司 Legal document positioning method, device, equipment and storage medium
CN112989820B (en) * 2021-03-22 2022-12-02 平安国际智慧城市科技股份有限公司 Legal document positioning method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109729126A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
WO2019085856A1 (en) Text resource push method and apparatus, storage medium, and processor
US10360230B2 (en) Method and device for social platform-based data mining
CN107844586B (en) News recommendation method and device
US9424319B2 (en) Social media based content selection system
CN109101658B (en) Information searching method and device, and equipment/terminal/server
CN110020122B (en) Video recommendation method, system and computer readable storage medium
US20120008821A1 (en) Video visual and audio query
US20120191694A1 (en) Generation of topic-based language models for an app search engine
Jr et al. Detection of human, legitimate bot, and malicious bot in online social networks based on wavelets
US11423096B2 (en) Method and apparatus for outputting information
US10083222B1 (en) Automated categorization of web pages
JP6457123B2 (en) Search processing method and device
CN103577452A (en) Website server and method and device for enriching content of website
CN103885987A (en) Music recommendation method and system
CN110290199B (en) Content pushing method, device and equipment
CN103617266A (en) Personalized extension search method, device and system
CN105991722B (en) Downloader recommendation method, application server, terminal and system
US9268861B2 (en) Method and system for recommending relevant web content to second screen application users
US9454568B2 (en) Method, apparatus and computer storage medium for acquiring hot content
CN110909266B (en) Deep paging method and device and server
CN111027065B (en) Leucavirus identification method and device, electronic equipment and storage medium
CN110008352B (en) Entity discovery method and device
WO2016101737A1 (en) Search query method and apparatus
CN112395388A (en) Information processing method and device
CN105099996B (en) Website verification method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18873992

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18873992

Country of ref document: EP

Kind code of ref document: A1