CN113836886A

CN113836886A - News title similarity identification method

Info

Publication number: CN113836886A
Application number: CN202110948184.4A
Authority: CN
Inventors: 王欢; 马云腾; 夏茂晋
Original assignee: Beijing Qingbo Intelligent Technology Co ltd
Current assignee: Beijing Qingbo Intelligent Technology Co ltd
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-12-24

Abstract

The invention discloses a news title similarity identification method, which comprises the following steps: 1. inputting two titles; 2. removing special characters in the two titles; 3. carrying out same character statistics on the removed title contents to obtain the same word number; 4. calculating the ratio of the same number of characters to the shortest title, judging the characters to be similar if the ratio is more than 0.5, otherwise, judging the characters to be dissimilar: the method is simple, rapid and highly portable.

Description

News title similarity identification method

Technical Field

The invention relates to the technical field of text recognition, in particular to a news title similarity recognition method.

Background

When the similarity of texts is calculated by the existing similar text recognition technology, a text similarity calculation technology based on a dictionary or a feature engineering is mainly adopted, and the accuracy of the dictionary or the feature engineering influences the accuracy of an algorithm to a great extent.

However, for short texts with a small vocabulary and little semantic information, such as news headlines, it is difficult to establish an accurate dictionary or feature engineering, which results in that the existing similar text recognition technology is difficult to capture key information in the short texts, the similarity calculation effect is poor, and the similar text recognition rate is low.

Namely, the existing similar text recognition technology has the technical problem of low similar text recognition rate for short texts such as news titles and the like.

Disclosure of Invention

In order to achieve the purpose, the invention adopts the technical scheme that:

a news title similarity identification method comprises the following steps:

1. inputting two titles;

2. removing special characters in the two titles;

3. carrying out same character statistics on the removed title contents to obtain the same word number;

4. and calculating the ratio of the same number of characters to the shortest title, judging the characters to be similar if the ratio is more than 0.5, and otherwise, judging the characters to be dissimilar.

The working principle and the beneficial effects are as follows: simple, fast and high transplantability.

Detailed Description

The invention will be better understood from the following examples.

A news title similarity identification method comprises the following steps:

1. inputting two titles; such as: a: "more than half of our country infected with helicobacter pylori! ", b: "more than half of people in China have been infected by helicobacter pylori bacteria";

2. removing special characters in the two titles;

Claims

1. A news title similarity identification method is characterized by comprising the following steps:

1. inputting two titles;

2. removing special characters in the two titles;