CN111626053A

CN111626053A - Method and device for recognizing descriptor of new case means, electronic device and storage medium

Info

Publication number: CN111626053A
Application number: CN202010438516.XA
Authority: CN
Inventors: 彭涛; 杜晶; 杨欣雨
Original assignee: Beijing Mingyi Technology Co ltd
Current assignee: Beijing Mingyi Technology Co ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2020-09-04
Anticipated expiration: 2040-05-21
Also published as: CN111626053B

Abstract

The present disclosure provides a new case means descriptor recognition method and apparatus, an electronic device, and a storage medium. One embodiment of the method comprises: acquiring a recent robbery and cheating historical alarm receiving and processing text set; performing word segmentation on each recent robbery past alarm receiving text in the recent robbery past alarm receiving text set to obtain corresponding word segmentation sequences, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation; generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in a target word segmentation sequence set; and for each binary spliced word in the binary spliced word library, executing identification operation to determine whether the binary spliced word is a new case-making means descriptor. The implementation mode realizes the automatic extraction of the descriptors of the new case means in the recent past robbery and cheating historical alarm receiving text set.

Description

Method and device for recognizing descriptor of new case means, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for recognizing descriptors in a new case, an electronic device, and a storage medium.

Background

The police department can generate an alarm receiving text after receiving an alarm and can generate an alarm handling text after handling the alarm. The alarm receiving and processing text comprises the alarm receiving text and the alarm processing text. The social industries and universities have the change of day and month and day, and along with the possible occurrence of various cases, the public security organs also generate corresponding alarm receiving and handling texts every day. In particular, in the case of theft or robbery, various forms of countermeasure will appear, and correspondingly, in the alarm receiving and handling text describing the case of theft or robbery, description about new countermeasure which has not appeared before will appear over time. The committing means here refer to the methods and measures taken by the committer in the committing process for the purpose. For example, in the case of theft or robbery, specific crime measures such as sliding door theft, window turnover theft, knife holding robbery, hiking, motorcycle riding robbery and the like can be included. For public security, it is necessary to know and deploy the corresponding processing scheme as soon as possible for the newly emerged committing means in time. That is, it is important to extract a new scenario descriptor for describing a scenario, which has not been generated before, from a recently generated alarm receiving text in a timely manner.

However, at present, the descriptors of the new case making means in the alarm receiving and processing text generated recently are basically extracted manually, the cost of required manpower and time is high, the novel case making means cannot be found in time and correspondingly processed, and hidden dangers are caused to the safety of lives and properties of people. In addition, the alarm receiving and processing texts are mostly described by natural language, the expression mode is seriously spoken and irregular, the manual extraction difficulty is high, and the learning cost is high in the process of manually extracting the descriptors of the new case making means depending on manual experience.

Disclosure of Invention

The present disclosure proposes a description word recognition method and apparatus, an electronic device, and a storage medium for a new proposal means.

In a first aspect, the present disclosure provides a new case descriptor recognition method, including: acquiring a recent past robbery history alarm receiving text set, wherein the recent past robbery history alarm receiving text set is a history alarm receiving text set which is generated within the latest preset operation means discovery duration and is used for describing robbery type cases; performing word segmentation on each recent robbery past alarm receiving text in the recent robbery past alarm receiving text set to obtain corresponding word segmentation sequences, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation; generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in a target word segmentation sequence in the target word segmentation sequence set; for each binary concatenation word in the binary concatenation word library, executing the following identification operation: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

In some optional embodiments, the performing word segmentation on each recent past robbery historical alarm receiving text in the recent past robbery historical alarm receiving text set to obtain a corresponding word segmentation sequence includes: performing word segmentation processing on each recent robbery and robbery historical alarm receiving text in the recent robbery and robbery historical alarm receiving text set based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence; and the above method further comprises: and adding each binary concatenation word determined as a new plan means descriptor in the binary concatenation word library into the preset word segmentation dictionary.

In some optional embodiments, the preset action finding time duration is predetermined by the following time duration determination steps: for each candidate duration in the preset set of candidate durations, performing the following identification accuracy determination operations: acquiring a historical alarm receiving and processing text set which is generated in the candidate duration and used for describing the robbery and robbery type cases and a corresponding description word set of a new labeling scheme; performing word segmentation processing on each historical alarm receiving and processing text in the acquired historical alarm receiving and processing text set to obtain a corresponding word segmentation sequence, and generating a word segmentation sequence set corresponding to the candidate duration by using each word segmentation sequence obtained after the word segmentation processing; generating a binary spliced word library corresponding to the candidate duration by using a binary spliced word formed by two adjacent participles in the participle sequence set corresponding to the candidate duration; for each binary concatenation word in a binary concatenation word library corresponding to the candidate duration, calculating word frequency, degree of freedom and degree of solidity of the binary concatenation word based on a segmentation sequence set corresponding to the candidate duration, and determining the binary concatenation word as a correct word in response to determining that the binary concatenation word satisfies each condition in the preset new word discovery condition set and the binary concatenation word belongs to the annotation new case means descriptor set, or in response to determining that the binary concatenation word does not satisfy at least one condition in the preset new word discovery condition set and the binary concatenation word does not belong to the annotation new case means descriptor set; determining the ratio of the number of the correct recognized words in the binary concatenation word library corresponding to the candidate duration to the number of the binary concatenation words in the binary concatenation word library corresponding to the candidate duration as the recognition accuracy corresponding to the candidate duration; and determining the corresponding candidate time length with the highest identification accuracy in the preset candidate time length set as the discovery time length of the preset committing means.

In some optional embodiments, for each binary concatenated word in the binary concatenated word library, calculating the word frequency, the degree of freedom, and the degree of solidity of the binary concatenated word based on the target word segmentation sequence set includes: for each word X in the binary concatenation lexicon X₁And participle x₂And (3) executing the following calculation operation on the spliced binary spliced word x: counting the word frequency P (x) of the binary spliced word x in the target word segmentation sequence set, and performing word segmentation x₁Word frequency P (x) in the target word sequence set₁) And word segmentation x₂Word frequency P (x) in the target word sequence set₂) (ii) a The coagulation degree Aglomeration (x) of the binary spliced word x is calculated according to the following formula:

generating a preamble adjacent word set Pre corresponding to the binary concatenated word x by using each participle which is positioned in front of the binary concatenated word x and adjacent to the binary concatenated word x in each participle sequence of the target participle sequence set_x(ii) a Counting the above preamble adjacent word set Pre_xThe word frequency P (y) of each word y in the target word segmentation sequence set; generating a Post-order adjacent word set Post corresponding to the binary spliced word x by using each participle positioned behind the binary spliced word x and adjacent to the binary spliced word x in each participle sequence of the target participle sequence set_x(ii) a Counting the Post adjacent word set Post_xThe word frequency P (z) of each word z in the target word segmentation sequence set; the degree of freedom free (x) of the binary concatenated word x is calculated according to the following formula:

Free(x)＝min(H(Pre_x)，H(Post_x))

in a second aspect, the present disclosure provides a new acting means descriptor recognition apparatus, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a recent past robbery historical alarm receiving text set, and the recent past robbery historical alarm receiving text set is generated within the latest preset crime finding duration and is used for describing robbery class cases; the first generation unit is configured to perform word segmentation on each recent past robbery historical alarm receiving text in the recent past robbery historical alarm receiving text set to obtain a corresponding word segmentation sequence, and generate a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation processing; the second generation unit is configured to generate a binary concatenated word library by using a binary concatenated word formed by two adjacent participles in a target participle sequence in the target participle sequence set; the recognition unit is configured to execute the following recognition operation on each binary concatenation word in the binary concatenation word library: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

In some optional embodiments, the performing word segmentation on each recent past robbery historical alarm receiving text in the recent past robbery historical alarm receiving text set to obtain a corresponding word segmentation sequence includes: performing word segmentation processing on each recent robbery and robbery historical alarm receiving text in the recent robbery and robbery historical alarm receiving text set based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence; and the above apparatus further comprises: and the adding unit is configured to add each binary concatenation word determined as a new composition means descriptor in the binary concatenation word library into the preset word segmentation dictionary.

generating a preamble adjacent word set Pre corresponding to the binary concatenated word x by using each participle which is positioned in front of the binary concatenated word x and adjacent to the binary concatenated word x in each participle sequence of the target participle sequence set_x(ii) a Counting the above preamble adjacent word set Pre_xEach word y in the target word segmentation sequence setFrequency p (y); generating a Post-order adjacent word set Post corresponding to the binary spliced word x by using each participle positioned behind the binary spliced word x and adjacent to the binary spliced word x in each participle sequence of the target participle sequence set_x(ii) a Counting the Post adjacent word set Post_xThe word frequency P (z) of each word z in the target word segmentation sequence set; the degree of freedom free (x) of the binary concatenated word x is calculated according to the following formula:

Free(x)＝min(H(Pre_x)，H(Post_x))

in a third aspect, the present disclosure provides an electronic device, comprising: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method as described in any of the implementations of the first aspect.

In order to enable public security organs to identify a new operation means descriptor from a recently generated alarm receiving and processing text, the applicant has found through research that the alarm receiving and processing text of the robbery and fraud type case is used for describing the robbery and fraud type case, and in the robbery and fraud type case, different operation means are often involved. If a certain binary spliced word is frequently shown in the recent past robbery historical alarm receiving and processing text, the robbery type alarm receiving and processing text mostly relates to descriptions of different committing means. Therefore, the binary concatenated word is highly likely to be a new word for describing a means of doing a case. Based on the above findings, the new case solution descriptor recognition method and device provided by the present disclosure obtain a recent robbery and fraud history alarm receiving text set, which is generated within a latest preset case solution discovery duration and used for describing robbery and fraud type cases, first. And performing word segmentation on each recent past robbery fraud history alarm receiving text in the recent past robbery fraud history alarm receiving text set to obtain a corresponding word segmentation sequence, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation. And then, generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in the target word segmentation sequence set. Then, for each binary spliced word in the binary spliced word library, calculating the word frequency, the degree of freedom and the degree of solidification of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set. According to the method for identifying the descriptors of the new case making means, manual operation is not needed in the whole process, and labor cost and time cost for finding the descriptors of the new case making means are reduced. In particular, in practice, a great amount of alarm receiving and processing texts are generated by public security organs every day, and the method can quickly identify new operation means descriptors from a great amount of recently generated alarm receiving and processing texts of the robbery and cheating categories. With the new crime measure descriptors obtained through recognition, the public security organization can track the new trend of the crime measures in time and carry out corresponding processing, the response speed of the public security organization to the new crime measures is improved, the life and property safety and social stability of people can be further maintained, the life and property loss of people is reduced, and the social instability factor is reduced.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a description word recognition method in accordance with the new implementation of the present disclosure;

FIG. 3 is a flow chart of one embodiment of a duration determination step according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a descriptor recognition method in accordance with the new implementation of the present disclosure;

FIG. 5 is a schematic diagram illustrating the structure of one embodiment of a descriptor recognition apparatus according to the new implementation of the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing the electronic device of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the new work implement descriptor recognition method or new work implement descriptor recognition apparatus of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an alarm receiving and processing record application, an alarm receiving and processing text new-case-means descriptor recognition application, a web browser application, and the like, may be installed on the terminal device 101.

The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices having a display screen and supporting text input, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example, to provide an alarm receiving text new-case-means descriptor recognition service), or as a single software or software module. And is not particularly limited herein.

The server 103 may be a server that provides various services, such as a background server that provides a plan recognition service for the alarm receiving text sent by the terminal device 101. The background server can analyze and process the received alarm receiving and processing text, and feed back the processing result (such as the solution description text set) to the terminal device.

In some cases, the new scenario descriptor recognition method provided by the present disclosure may be executed by both the terminal device 101 and the server 103, for example, the step of "obtaining the recent robbery history alarm receiving text set" may be executed by the terminal device 101, and the rest of the steps may be executed by the server 103. The present disclosure is not limited thereto. Accordingly, the new pattern description word recognition device may be provided in the terminal apparatus 101 and the server 103, respectively.

In some cases, the new implementation description word recognition method provided by the present disclosure may be executed by the server 103, and accordingly, a new implementation description word recognition apparatus may also be disposed in the server 103, and in this case, the system architecture 100 may not include the terminal device 101.

In some cases, the new pattern descriptor recognition method provided by the present disclosure may be executed by the terminal device 101, and accordingly, the new pattern descriptor recognition apparatus may also be disposed in the terminal device 101, in which case, the system architecture 100 may not include the server 103.

The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, for providing an alarm receiving text and plan making means identification service), or may be implemented as a single software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a new implementation description word recognition method in accordance with the present disclosure is shown. The new scheme means descriptor identification method comprises the following steps:

step 201, obtaining a recent past robbery and fraud history alarm receiving and handling text set.

In the present embodiment, the executing body (for example, the server shown in fig. 1) of the new scenario descriptor recognition method may first obtain a recent robbery and fraud history alarm receiving text set. Here, the recent past robbery and fraud historical alarm receiving text set is a historical alarm receiving text set which is generated within the latest preset committing means discovery time length and is used for describing robbery and fraud type cases. Here, the robbery category case is a case about theft, robbery, or fraud. In practice, various criminal measures are often involved in the case of robbery and cheating. For example, in the case of theft, there may be a means of burglary of a sliding door, a window, or the like. In the case of robbery, there may be some counter measures such as knife-holding robbery, hiking, motorcycle-riding robbery, etc. In the fraud type case, there may be a telecommunication fraud, financial fraud, etc. practice means. Therefore, in order to identify new case descriptor that has not appeared before, the historical alarm receiving text of the robbery and cheating type case needs to be obtained.

Here, the preset composition means discovery period may be preset in various implementations. For example, the preset committing means discovery time length may be a time length preset and stored in the execution main body by a technician according to the calculation performance parameters of the execution main body and the number of the alarm receiving texts of the robbery and fraud classes generated in the unit time length historically. For example, the preset crime finding duration may be 5 days, or 150 hours. It can be understood that the longer the preset case finding time is, the larger the data amount in the obtained recent past robbery and fraud history alarm receiving text set is, and accordingly, the longer the time for identifying the new case descriptor in the recent past robbery and fraud history alarm receiving text set is, which may prolong the time for obtaining the new case descriptor, thereby affecting the response speed of the public security organization to the new case. In addition, if the preset planning means finding time is too short, the text data in the acquired recent past robbery and past alarm receiving text set may be too little, and a new planning means descriptor may not be obtained or is not an actual new planning means descriptor. Therefore, when the preset operation means is set to be found, a balance needs to be struck between the time required for calculation and the accuracy of determining the descriptors of the new operation means.

Here, the execution subject may obtain the recent fraud history alarm receiving text set stored locally, or the execution subject may remotely obtain the recent fraud history alarm receiving text set from another electronic device (for example, the terminal device shown in fig. 1) connected to the execution subject through a network.

Here, the historical alarm receiving and processing text may be text data that an alarm receiver arranges according to the content of an alarm receiving telephone or text data that an alarm processor arranges according to an alarm processing procedure. The historical alarm receiving and processing text can also be alarm text received from the terminal equipment and input by a user in an alarm application installed on the terminal equipment or a webpage with an alarm function.

It should be noted that the recent past robbery and fraud historical alarm receiving text set obtained here may be a historical alarm receiving text set generated within the discovery duration of the original recent preset plan means and used for describing robbery and fraud category cases; the acquired recent robbery and fraud historical alarm receiving text set can also be a text set generated in the discovery duration of the original latest preset operation means and used for describing robbery and fraud class cases after preprocessing. By way of example, preprocessing may include, but is not limited to, removing invalid characters, full half-angle conversion, and the like. The invalid characters may be, for example, a mood word, a null word, or the like.

Step 202, performing word segmentation on each recent past robbery fraud history alarm receiving text in the recent past robbery fraud history alarm receiving text set to obtain a corresponding word segmentation sequence, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation processing.

In this embodiment, the executing body may perform word segmentation on each recent robbery and robbery historical alarm receiving text in the recent robbery and historical alarm receiving text set obtained in step 201 to obtain a corresponding word segmentation sequence, and then may generate a target word segmentation sequence set from each word segmentation sequence obtained after the word segmentation.

It should be noted that how to cut words of text is the prior art of extensive research and application in this field, and will not be described herein. For example, a word segmentation method based on string matching, a word segmentation method based on understanding, or a word segmentation method based on statistics, etc. may be employed. For example, word segmentation is performed on a historical alarm receiving text 'someone alarms and says that someone in a certain street holds a knife and a robbery' so as to obtain a word segmentation sequence 'someone | alarm | say | street | someone | hold a knife and a robbery'.

And 203, generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in the target word segmentation sequence set.

In this embodiment, the execution main body may generate a binary concatenated word library by using a binary concatenated word composed of two adjacent segmented words in a target segmented word sequence in the target segmented word sequence set.

For example, assume that the target participle sequence set is { "has | person | alarm | call | certain | street | has | person | hold | knife | robbery |, | opens | three | connect | to | fraud | phone | and | give | fraud | suspicion | person | transfer | five | ten thousand | yuan }, and the binary concatenation word library obtained through step 203 is {" person "," person alarm "," alarm call "," certain street "," person holding, "," knife holding and robbery "," open three "," three connection "," connect "," to "," fraud "," phone "," telephone call sum "," give "," fraud "," suspicion "," suspect "," person "," transfer five "," five money order "}.

And 204, executing identification operation on each binary concatenated word in the binary concatenated word library.

In this embodiment, the execution subject may execute the recognition operation for each binary concatenated word in the binary concatenated word library generated in step 203. Specifically, the identifying operation may include sub-step 2041 and sub-step 2042.

And a substep 2041 of calculating the word frequency, the degree of freedom and the degree of solidity of the binary concatenated word based on the target word segmentation sequence set.

In this embodiment, the execution main body may adopt various implementation manners to calculate the word frequency, the degree of freedom, and the degree of solidity of the binary concatenated word based on the target segmented word sequence set.

And the word frequency of the binary concatenated word is used for representing the degree of the occurrence frequency of the binary concatenated word in the target word segmentation sequence set. If the degree of the occurrence frequency of the word frequency target word sequence set of the binary concatenated word is higher, the probability that the binary concatenated word is a descriptor of a new scheme means is higher.

In some optional implementation manners, calculating the word frequency of the binary concatenated word based on the target word segmentation sequence set may be to count a sum of occurrence times of the binary concatenated word in each target word segmentation sequence of the target word segmentation sequence set, and determine the sum of the occurrence times obtained through the counting as the word frequency of the binary concatenated word.

In some optional implementations, calculating the word frequency of the binary concatenated word based on the target word segmentation sequence set may also be performed as follows: firstly, counting the sum of the occurrence times of the binary concatenated word in each target word segmentation sequence of the target word segmentation sequence set, and then determining the word frequency of the binary concatenated word by the ratio obtained by dividing the counted sum of the occurrence times by the sum of the total occurrence times of the segmented words corresponding to the target word segmentation sequence set. Here, the sum of the total occurrence times of the participles corresponding to the target participle sequence set is the sum of the occurrence times of each participle in each target participle sequence in the target participle sequence set.

The degree of solidification of the binary concatenated word is used for representing the degree of fixation or combination of two participles included in the binary concatenated word in a target participle sequence, and if the degree of fixation or combination of the binary concatenated word in a target participle sequence set is higher, the probability that the binary concatenated word is a new case means descriptor is higher.

Assuming that the binary concatenation word bank is X, for each participle X in the binary concatenation word bank X₁And participle x₂The binary spliced word x is formed by splicing, namely x is x₁x₂And the word frequency of the binary spliced word x in the target word sequence set is assumed to be P (x).

In some optional implementations, the freezing degree agglobometry (x) of the binary concatenated word x may be calculated based on the target word sequence set according to the following method:

first, a participle x can be determined₁Word frequency P (x) in target word sequence set₁) And word segmentation x₂Word frequency P (x) in target word sequence set₂). It should be noted that, P (x) may be determined by the same method as the above-mentioned method for determining the word frequency P (x) of the binary concatenated word x in the target word sequence set₁) And P (x)₂)。

Then, the freezing degree agglobometry (x) of the binary concatenated word x can be calculated according to the following formula:

suppose that the binary concatenated word x and the participle x₁And word segmentation x₂The occurrence times in each target word segmentation sequence of the target word segmentation sequence set are n and n respectively₁And n₂And if the sum of the total times of occurrence of the participles corresponding to the target participle sequence set is N, and N is a positive integer, then P (x), P (x)₁) And P (x)₂) Can be n and n respectively₁And n₂Or P (x), P (x)₁) AndP(x₂) Or can be respectively

And

as can be seen from the above formula, when P (x) and P (x) are in the same state₁) And P (x)₂) Are respectively n and n₁And n₂The coagulation degree agrometration (x) of the binary concatenated word x can be expressed as follows:

when P (x), P (x)₁) And P (x)₂) Are respectively as

And

the coagulation degree agrometration (x) of the binary concatenated word x can be expressed as follows:

as can be seen from the formulas 2 and 3, the freezing degree Aglomeration (x) of the binary concatenated word x is respectively equal to the participle x₁Number of occurrences n in target sequence of part words set₁And word segmentation x₂Number of occurrences n in target sequence of part words set₂Inversely proportional to the number of occurrences n of the binary concatenated word x in the target word sequence set. Wherein:

the maximum limit of the Aggloration (x) is n₁、n₂And n are the same, and if the word frequency is calculated by the method shown in formula 2, the Aggloration (x) is

Accordingly, if a male is usedWhen the word frequency is calculated by the method shown in formula 3, the aggregate (x) is

At this time, the situation that the corresponding binary concatenated word x appears in the target word segmentation sequence set is that only the word segmentation x needs to be divided₁Occurrence and word segmentation x₂Appear together and only have to be participled x₂Occurrence and word segmentation x₁Appear together without x₁Occurring or participled x individually₂Appearing alone, indicating a binary concatenated word x₁x₂The probability of use in combination as a word is high.

Conversely, the minimum limit of the Aggloration (x) is that n is 1 and n is₁And/or n₂If the word frequency is calculated by the method shown in formula 2, the aggregate ratio (x) is greater than 1

Accordingly, if the word frequency is calculated by the method shown in equation 3, the Aggloration (x) is

At this time, the situation that the corresponding binary concatenated word x appears in the target word segmentation sequence set is that the word segmentation x₁Only once with word segmentation x₂Taken together, in other cases the word segmentation x₁Occurring or participled x individually₂Appearing alone, indicating a binary concatenated word x₁x₂The probability of use in combination as a word is low.

It can be understood that other methods may also be adopted to calculate the freezing degree aglomeration (x) of the binary concatenated word x based on the target segmented word sequence set, as long as the freezing degree aglomeration (x) of the binary concatenated word x and the segmented word x are respectively equal to each other₁Number of occurrences n in target sequence of part words set₁And word segmentation x₂Number of occurrences n in target sequence of part words set₂And negative correlation is carried out, and positive correlation is carried out on the occurrence frequency n of the binary spliced word x in the target word segmentation sequence set. For example, the binary concatenated word can be calculated by using the following formula 4 or formula 5Degree of coagulation of x agglomeration (x):

Agglomeration(x)＝P(x₁)+P(x₂)-P(x₁x₂) (formula 5)

The degree of freedom of the binary concatenated word is used for representing the degree of free combination of the binary concatenated word as a whole with other segmented words in a target segmented word sequence, namely, if the preceding word and the following word are relatively fixed, the degree of freedom of the binary concatenated word as a whole can be considered to be low, and the binary concatenated word may not be a new pattern descriptor. On the contrary, if the binary concatenated word is taken as a whole, if the preceding word and the following word are more variable, the degree of freedom of the binary concatenated word can be considered to be higher, and the binary concatenated word can be a new pattern description word and can be freely combined with other words around.

Here, the above description about X, X continues₁、x₂、P(x)、P(x₁) And P (x)₂) In some optional implementations, the degree of freedom free (x) of the binary concatenated word x may be calculated based on the target word segmentation sequence set according to the following method:

firstly, generating a preamble adjacent word set Pre corresponding to a binary spliced word x by using each participle which is positioned in front of the binary spliced word x and adjacent to the binary spliced word x in each participle sequence of a target participle sequence set_x。

Second, statistics is carried out on the adjacent word set Pre of the preamble_xThe word frequency p (y) of each word y in the target set of word sequences.

Thirdly, generating a Post-order adjacent word set Post corresponding to the binary spliced word x by using each participle which is positioned behind the binary spliced word x and adjacent to the binary spliced word x in each participle sequence of the target participle sequence set_x。

Fourthly, counting Post adjacent word set Post_xWhere each word z is at the targetThe word frequency p (z) in the set of sequences of words.

And fifthly, calculating the degree of freedom free (x) of the binary spliced word x according to the following formula.

Free(x)＝min(H(Pre_x)，H(Post_x) Equation 8)

As can be seen from the above description and from equations 6, 7 and 8, H (Pre)_x) Namely the preamble adjacent word set Pre corresponding to the binary concatenation word x_xIs reflected by the preface adjacent word set Pre corresponding to the binary concatenated word x_xThe degree of variation of (a) can also be understood as the degree of freedom of the participle before the binary concatenated word x. H (Post)_x) Namely the Post adjacent word set Post corresponding to the binary spliced word x_xIs reflected by the Post-adjacent word set Post corresponding to the binary concatenated word x_xThe degree of variation of (a) can also be understood as the degree of freedom of the participle after the binary concatenated word x. The degree of freedom free (x) of the binary concatenation word x is H (pre)_x) And H (Post)_x) The smaller value of the two-dimensional concatenation word x is the smaller value of the degree of freedom free (x) of the two-dimensional concatenation word, which is reflected by the smaller value of the degree of change of the corresponding preceding adjacent word set and the degree of change of the following adjacent word set of the two-dimensional concatenation word. When the degree of freedom free (x) of the binary concatenated word x is larger, it indicates that the degree of change of the word before and after the binary concatenated word x is higher, that is, the degree of freedom of the word before and after the binary concatenated word x is higher, that is, the degree of free combination of the binary concatenated word x and other words is higher, and the probability that the binary concatenated word x is a descriptor in a new case making method is higher.

Substep 2042, in response to determining that the binary concatenated word satisfies each condition in the preset new word discovery condition set, determining the binary concatenated word as a new case means descriptor.

Here, the execution subject may determine whether the binary spliced word satisfies each condition of a preset new word discovery condition group. If it is determined that the match is satisfied, the binary concatenation word may be determined to be a new composition means descriptor. Wherein the preset new word discovery condition group may include at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

The explanation in substep 2041 continues, and let T be assumed_p、T_aAnd T_fThe preset word frequency threshold, the preset freezing degree threshold, and the preset degree of freedom threshold are respectively, and the preset new word discovery condition group may include at least one of the following conditions:

the first condition is as follows: p (x)>T_p；

And a second condition: aggloration (x)>T_a；

And (3) carrying out a third condition: free (x)>T_f。

In practice, the preset word frequency threshold, the preset freezing degree threshold and the preset degree of freedom threshold may be manually set by a technician according to experience and stored in the execution body.

As can be understood from the description in sub-step 2041, if each condition in the preset new word discovery condition set is satisfied, indicating that the binary conjunct word x has a high possibility of being a new scenario descriptor, the binary conjunct word may be determined as a new scenario descriptor.

In some alternative implementations, the preset composition finding time period recorded in step 201 may be predetermined by a time period determination step as shown in fig. 3. Referring to fig. 3, fig. 3 shows a flow 300 of one embodiment of the duration determination step according to the present disclosure. The time length determining step comprises the following steps:

here, the execution subject of the time length determination step may be the same as the execution subject of the above-described new pattern means descriptor recognition method. In this way, the execution subject of the duration determination step may store the determined preset composition means discovery duration in the local execution subject after determining the preset composition means discovery duration, and read the determined preset composition means discovery duration in the process of executing the new composition means descriptor recognition method.

Here, the execution subject of the time length determination step may be different from the execution subject of the new operation means descriptor identification method. In this way, the execution main body of the time length determination step may determine the discovery time length of the preset composition means, and then send the determined discovery time length of the preset composition means to the local execution main body of the new composition means descriptor recognition method. Thus, the executing body of the new solution descriptor recognition method can read the received preset solution discovery time in the process of executing the new solution descriptor recognition method.

Step 301, for each candidate duration in the preset candidate duration set, performing an identification accuracy determination operation.

Here, the preset candidate duration set may be a set consisting of at least one candidate duration. The time units of the candidate durations may be the same or different. For example, the time unit of the candidate duration may be day, hour, or both day and hour. As an example, the preset candidate duration set may be {1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days }.

Here, the execution subject of the duration determination step may execute the recognition accuracy determination operation for each candidate duration in the preset candidate duration set, and specifically, the recognition accuracy determination operation may include sub-steps 3011 to 3015:

and a substep 3011, obtaining a historical alarm receiving and processing text set which is generated in the candidate duration and used for describing the robbery and robbery type cases, and a corresponding description word set of a labeling new case making means.

In practice, a new case scenario descriptor set for describing a new case scenario which has not appeared in history can be marked manually in the historical alarm receiving text set generated in the candidate duration and used for describing the robbery and fraud class case.

Here, assuming that the candidate duration is 3 days in the preset candidate duration set of the above example, here in sub-step 3011, a historical alarm receiving text set generated in the last 3 days and used for describing the case of robbery and fraud category and a corresponding set of new case means descriptors are obtained.

And a substep 3012, performing word segmentation processing on each historical alarm receiving and processing text in the obtained historical alarm receiving and processing text set to obtain a corresponding word segmentation sequence, and generating a word segmentation sequence set corresponding to the candidate duration by using each word segmentation sequence obtained after the word segmentation processing.

Here, how to cut words of the text to obtain the word segmentation sequence may refer to the related description in step 202, and is not described herein again.

And a substep 3013, generating a binary concatenated word library corresponding to the candidate duration by using a binary concatenated word composed of two adjacent segmented words in the segmented word sequence set corresponding to the candidate duration.

Sub-step 3014, for each binary concatenated word in the binary concatenated word library corresponding to the candidate duration, calculating word frequency, degree of freedom, and degree of solidity of the binary concatenated word based on the set of segment sequences corresponding to the candidate duration, and in response to determining that the binary concatenated word satisfies each condition in the set of preset new word discovery conditions and the binary concatenated word belongs to the set of annotated new case descriptor, or in response to determining that the binary concatenated word does not satisfy at least one condition in the set of preset new word discovery conditions and the binary concatenated word does not belong to the set of annotated new case descriptor, determining the binary concatenated word as the recognized correct word.

Here, the execution subject of the duration determination step may determine, for each binary spliced word in the binary spliced word library corresponding to the candidate duration generated in sub-step 3013, the binary spliced word as the correct word to be recognized in response to determining that the binary spliced word satisfies each condition in the preset new word discovery condition group and the binary spliced word belongs to the annotated new scenario descriptor set, or in response to determining that the binary spliced word does not satisfy at least one condition in the preset new word discovery condition group and the binary spliced word does not belong to the annotated new scenario descriptor set. That is, if a new word discovery condition set is preset as described above, the binary concatenated word is a new case means descriptor. Meanwhile, if the binary concatenated word is also a new case means descriptor set according to the new case means descriptor set labeled in the substep 3011, it may be considered that a condition set is found according to a preset new word, the binary concatenated word is correctly identified, and the binary concatenated word may be determined as an identified correct word. Similarly, if the condition group is found according to the preset new words, the binary concatenated word is not a new case means descriptor. Meanwhile, if the binary concatenated word is not the descriptor of the new case solving means according to the descriptor set labeled with the new case solving means obtained in the substep 3011, it is considered that the condition set is found according to the preset new word, the binary concatenated word is also correctly identified, and the binary concatenated word can be determined as the correctly identified word. Otherwise, if the condition group is found according to the preset new words, the binary spliced word is a new case means descriptor. Meanwhile, according to the set of descriptors labeled with the new case means obtained in the substep 3011, if the binary concatenated word is not a descriptor labeled with the new case means, it may be considered that a condition group is found according to a preset new word, and if the binary concatenated word is recognized as a wrong word, the binary concatenated word may be determined as a wrong recognized word. Similarly, if the condition group is found according to the preset new words, the binary concatenated word is not a new case means descriptor. Meanwhile, if the binary concatenated word is a new case means descriptor set according to the notation new case means descriptor set obtained in the substep 3011, it may be considered that a condition set is found according to a preset new word, the binary concatenated word is also recognized as an error, and the binary concatenated word may be determined as a recognized error word.

Sub-step 3015, determining the ratio of the number of correct recognized words in the binary concatenated word library corresponding to the candidate duration to the number of binary concatenated words in the binary concatenated word library corresponding to the candidate duration as the recognition accuracy corresponding to the candidate duration.

Since it has been determined in sub-step 3014 whether each binary-spliced word in the binary-spliced word bank corresponding to the candidate duration is an identified correct word, a ratio of the number of identified correct words in the binary-spliced word bank corresponding to the candidate duration divided by the number of binary-spliced words in the binary-spliced word bank corresponding to the candidate duration may be determined as the identification accuracy corresponding to the candidate duration in sub-step 3015.

Step 302, determining the candidate duration with the highest recognition accuracy corresponding to the preset candidate duration set as the preset proposal means discovery duration.

After step 301, the identification accuracy corresponding to each candidate duration in the preset candidate duration set is determined, where the candidate duration with the highest identification accuracy corresponding to the preset candidate duration set may be determined as the preset planning means discovery duration.

The preset planning means discovery time length determined according to the time length determining step shown in fig. 3 can be realized by acquiring a historical alarm receiving text set for describing the theft and robbery type case, which is generated in the preset planning means discovery time length determined according to the time length determining step shown in fig. 3 and is used for describing the theft and robbery type case, when acquiring a recent theft and robbery historical alarm receiving text set in the process of executing a new planning means word recognition method, because the preset planning means discovery time length determined according to the time length determining step shown in fig. 3 is the highest recognition accuracy rate corresponding to the preset candidate time length set, and the historical alarm receiving text set for describing the theft and robbery type, which is generated in a longer time for improving the recognition accuracy rate, is not required, so that the calculation amount is reduced, and the calculation efficiency and the recognition effect can be considered.

The method provided by the above embodiment of the present disclosure obtains a recent robbery and robbery history alarm receiving and processing text set generated within a latest preset case finding duration and used for describing robbery and robbery type cases. And generating a binary spliced word bank corresponding to the recent robbery and cheating historical alarm receiving text set. And finally, for each binary spliced word in the binary spliced word library, calculating the word frequency, the degree of freedom and the degree of solidification of the binary spliced word based on the target word segmentation sequence set, and if the binary spliced word is determined to meet each condition in a preset new word discovery condition group, determining the binary spliced word as a new case means descriptor. According to the method for identifying the descriptors of the new case making means, manual operation is not needed in the whole process, and labor cost and time cost for finding the descriptors of the new case making means are reduced. Further, the monitoring and processing speed of the public security organization on the newly appeared crime measures can be improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a new implementation descriptor recognition method is shown. The process 400 of the new construction means descriptor recognition method comprises the following steps:

step 401, obtaining a recent past robbery and cheating historical alarm receiving and processing text set.

In this embodiment, the specific operation and the technical effect of step 401 are substantially the same as those of step 201 in the embodiment shown in fig. 2, and are not repeated herein.

Step 402, performing word segmentation processing on each recent past robbery historical alarm receiving text in the recent past robbery historical alarm receiving text set based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation processing.

In this embodiment, the execution main body of the description word recognition method by the new scheme means may adopt a word segmentation method based on a dictionary, perform word segmentation processing on each recent robbery and fraud historical alarm receiving text in the recent robbery and fraud historical alarm receiving text set obtained in step 401 based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence, and generate a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation processing.

In practice, the dictionary-based word segmentation method may include a forward maximum matching method, a reverse maximum matching method, and a bidirectional matching word segmentation method according to different scanning directions. The word segmentation method based on the dictionary may refer to matching word strings to be analyzed (for example, each recent robbery and fraud history alarm receiving text in the recent robbery and fraud history alarm receiving text set in step 402) with entries in a preset word segmentation dictionary according to a certain policy, and if a word string exists in the dictionary, the word string may be segmented into words, and then matching of the next word string is performed.

And 403, generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in the target word segmentation sequence set.

And step 404, executing identification operation on each binary concatenated word in the binary concatenated word library.

In this embodiment, the specific operations of step 403 and step 404 and the technical effects thereof are substantially the same as the operations and effects of step 203 and step 204 in the embodiment shown in fig. 2, and are not repeated herein.

And 405, adding all the binary concatenated words determined as the new case means descriptors in the binary concatenated word library into a preset word segmentation dictionary.

In this embodiment, the execution subject may add each binary concatenated word determined as the new scenario descriptor in step 404 in the binary concatenated word library generated in step 403 to a preset word segmentation dictionary. Thus, when the new proposal means descriptor recognition method is executed again next time, the new proposal means descriptor recognized this time is already added into the preset word segmentation dictionary, namely the preset word segmentation dictionary is updated, and the new proposal means descriptor recognized this time is not recognized as a new proposal means descriptor next time.

It should be noted that the preset word segmentation dictionary may be obtained by gradually adding a description word of a new case making means on the basis of the general word segmentation dictionary.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the description word recognition method by the new implementation manner in this embodiment has more steps to update the preset word dictionary. Therefore, the scheme described in this embodiment can update the preset word segmentation dictionary in real time, so that when the new case writing means descriptor is identified next time, because the word identified as the new case writing means descriptor at this time is already added into the preset word segmentation dictionary, the word identified as the new case writing means descriptor will not be identified as the new case writing means descriptor again in the future.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a new implementation description word recognition apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the new working means descriptor identification device 500 of the present embodiment includes: an acquisition unit 501, a first generation unit 502, a second generation unit 503, and a recognition unit 504. The acquiring unit 501 is configured to acquire a recent past robbery historical alarm receiving text set, where the recent past robbery historical alarm receiving text set is a historical alarm receiving text set which is generated within a latest preset committing means discovery duration and used for describing robbery type cases; a first generating unit 502, configured to perform word segmentation on each recent past robbery historical alarm receiving text in the recent past robbery historical alarm receiving text set to obtain a corresponding word segmentation sequence, and generate a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation; a second generating unit 503 configured to generate a binary concatenated word library by using a binary concatenated word composed of two adjacent segmented words in the target segmented word sequence set; the identifying unit 504 is configured to perform the following identifying operation for each binary concatenated word in the binary concatenated word library: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

In this embodiment, specific processes of the obtaining unit 501, the first generating unit 502, the second generating unit 503, and the identifying unit 504 of the new solution descriptor identifying apparatus 500 and technical effects thereof can refer to the related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, respectively, and are not repeated herein.

In some optional embodiments, the performing word segmentation on each recent robbery history alarm receiving text in the recent robbery history alarm receiving text set to obtain a corresponding word segmentation sequence may include: performing word segmentation processing on each recent robbery and robbery historical alarm receiving text in the recent robbery and robbery historical alarm receiving text set based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence; and the apparatus 500 may further include: an adding unit 505 configured to add each binary concatenated word determined as a new composition means descriptor in the binary concatenated word library to the preset word segmentation dictionary.

In some optional embodiments, the preset committing means finding time duration may be predetermined by the following time duration determination steps: for each candidate duration in the preset set of candidate durations, performing the following identification accuracy determination operations: acquiring a historical alarm receiving and processing text set which is generated in the candidate duration and used for describing the robbery and robbery type cases and a corresponding description word set of a new labeling scheme; performing word segmentation processing on each historical alarm receiving and processing text in the acquired historical alarm receiving and processing text set to obtain a corresponding word segmentation sequence, and generating a word segmentation sequence set corresponding to the candidate duration by using each word segmentation sequence obtained after the word segmentation processing; generating a binary spliced word library corresponding to the candidate duration by using a binary spliced word formed by two adjacent participles in the participle sequence set corresponding to the candidate duration; for each binary concatenation word in a binary concatenation word library corresponding to the candidate duration, calculating word frequency, degree of freedom and degree of solidity of the binary concatenation word based on a segmentation sequence set corresponding to the candidate duration, and determining the binary concatenation word as a correct word in response to determining that the binary concatenation word satisfies each condition in the preset new word discovery condition set and the binary concatenation word belongs to the annotation new case means descriptor set, or in response to determining that the binary concatenation word does not satisfy at least one condition in the preset new word discovery condition set and the binary concatenation word does not belong to the annotation new case means descriptor set; determining the ratio of the number of the correct recognized words in the binary concatenation word library corresponding to the candidate duration to the number of the binary concatenation words in the binary concatenation word library corresponding to the candidate duration as the recognition accuracy corresponding to the candidate duration; and determining the corresponding candidate time length with the highest identification accuracy in the preset candidate time length set as the discovery time length of the preset committing means.

In some optional embodiments, the calculating, for each binary concatenated word in the binary concatenated word library, a word frequency, a degree of freedom, and a degree of solidity of the binary concatenated word based on the target word segmentation sequence set may include: for each word X in the binary concatenation lexicon X₁And participle x₂And (3) executing the following calculation operation on the spliced binary spliced word x: counting the word frequency P (x) of the binary spliced word x in the target word segmentation sequence set, and performing word segmentation x₁Word frequency P (x) in the target word sequence set₁) And word segmentation x₂Word frequency P (x) in the target word sequence set₂) (ii) a The coagulation degree Aglomeration (x) of the binary spliced word x is calculated according to the following formula:

Free(x)＝min(H(Pre_x)，H(Post_x))

it should be noted that details of implementation and technical effects of each unit in the description word recognition device of the new implementation manner provided by the present disclosure may refer to descriptions of other embodiments in the present disclosure, and are not described herein again.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing the electronic device of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the present disclosure.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM602, and RAM 603 are connected to each other via a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input section 606 including a touch screen, a tablet, a keyboard, a mouse, or the like; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication section 609. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in this disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first generation unit, a second generation unit, and a recognition unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the acquiring unit may also be described as a "unit that acquires a recent robbery and fraud history alarm receiving text set".

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a recent past robbery history alarm receiving text set, wherein the recent past robbery history alarm receiving text set is a history alarm receiving text set which is generated within the latest preset operation means discovery duration and is used for describing robbery type cases; performing word segmentation on each recent robbery past alarm receiving text in the recent robbery past alarm receiving text set to obtain corresponding word segmentation sequences, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation; generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in a target word segmentation sequence in the target word segmentation sequence set; for each binary concatenation word in the binary concatenation word library, executing the following identification operation: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A new case means descriptor recognition method comprises the following steps:

acquiring a recent past robbery history alarm receiving text set, wherein the recent past robbery history alarm receiving text set is a history alarm receiving text set which is generated within the latest preset operation means discovery duration and is used for describing robbery type cases;

performing word segmentation on each recent robbery past alarm receiving text in the recent robbery past alarm receiving text set to obtain corresponding word segmentation sequences, and generating a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation;

generating a binary concatenation word library by using a binary concatenation word formed by two adjacent word segmentations in a target word segmentation sequence in the target word segmentation sequence set;

for each binary concatenated word in the binary concatenated word library, performing the following identification operations: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

2. The method of claim 1, wherein the performing word segmentation on each recent robbery and fraud history alarm receiving text in the recent robbery and fraud history alarm receiving text set to obtain a corresponding word segmentation sequence comprises:

performing word segmentation processing on each recent robbery and robbery historical alarm receiving text in the recent robbery and robbery historical alarm receiving text set based on a preset word segmentation dictionary to obtain a corresponding word segmentation sequence; and

the method further comprises the following steps:

and adding each binary concatenation word determined as a new composition means descriptor in the binary concatenation word library into the preset word segmentation dictionary.

3. The method according to claim 1 or 2, wherein the preset composition means discovery period is predetermined by the following period determination steps:

for each candidate duration in the preset set of candidate durations, performing the following identification accuracy determination operations: acquiring a historical alarm receiving and processing text set which is generated in the candidate duration and used for describing the robbery and robbery type cases and a corresponding description word set of a new labeling scheme; performing word segmentation processing on each historical alarm receiving and processing text in the acquired historical alarm receiving and processing text set to obtain a corresponding word segmentation sequence, and generating a word segmentation sequence set corresponding to the candidate duration by using each word segmentation sequence obtained after the word segmentation processing; generating a binary spliced word library corresponding to the candidate duration by using a binary spliced word formed by two adjacent participles in the participle sequence set corresponding to the candidate duration; for each binary concatenation word in a binary concatenation word library corresponding to the candidate duration, calculating word frequency, degree of freedom and degree of solidity of the binary concatenation word based on a segmentation sequence set corresponding to the candidate duration, and determining the binary concatenation word as a correct word in response to determining that the binary concatenation word satisfies each condition in the preset new word discovery condition set and the binary concatenation word belongs to the annotation new case means descriptor set, or in response to determining that the binary concatenation word does not satisfy at least one condition in the preset new word discovery condition set and the binary concatenation word does not belong to the annotation new case means descriptor set; determining the ratio of the number of the correct recognized words in the binary concatenation word library corresponding to the candidate duration to the number of the binary concatenation words in the binary concatenation word library corresponding to the candidate duration as the recognition accuracy corresponding to the candidate duration;

and determining the corresponding candidate time length with the highest identification accuracy in the preset candidate time length set as the preset committing means discovery time length.

4. The method of claim 3, wherein the calculating, for each binary spliced word in the binary spliced thesaurus, the word frequency, the degree of freedom and the degree of solidification of the binary spliced word based on the target word segmentation sequence set comprises:

for each participle X in the binary concatenated lexicon X₁And participle x₂And (3) executing the following calculation operation on the spliced binary spliced word x:

counting the word frequency P (x) of the binary spliced word x in the target word segmentation sequence set, wherein the word segmentation x₁Word frequency P (x) in the target set of partial word sequences₁) And word segmentation x₂Word frequency P (x) in the target set of partial word sequences₂)；

The coagulation degree Aglomeration (x) of the binary spliced word x is calculated according to the following formula:

generating a preamble adjacent word set Pre corresponding to the binary concatenated word x by using each participle which is positioned in front of the binary concatenated word x and adjacent to the binary concatenated word x in each participle sequence of the target participle sequence set_x；

Counting the preamble adjacent word set Pre_xThe word frequency P (y) of each word y in the target word segmentation sequence set;

generating a Post-order adjacent word set Post corresponding to the binary concatenated word x by using each participle positioned behind the binary concatenated word x and adjacent to the binary concatenated word x in each participle sequence of the target participle sequence set_x；

Counting Post adjacent word set Post_xEach of whichWord frequency p (z) of word z in the set of target sequence of partial words;

the degree of freedom free (x) of the binary concatenated word x is calculated according to the following formula:

Free(x)＝min(H(Pre_x)，H(Post_x))

5. a new working means descriptor recognition apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a recent past robbery historical alarm receiving text set, wherein the recent past robbery historical alarm receiving text set is generated within a latest preset crime means discovery duration and is used for describing robbery category cases;

the first generation unit is configured to perform word segmentation on each recent robbery history alarm receiving text in the recent robbery history alarm receiving text set to obtain a corresponding word segmentation sequence, and generate a target word segmentation sequence set by using each word segmentation sequence obtained after the word segmentation processing;

a second generation unit configured to generate a binary concatenated word library using a binary concatenated word composed of two adjacent participles in a target participle sequence in the target participle sequence set;

the recognition unit is configured to execute the following recognition operation for each binary concatenation word in the binary concatenation word library: calculating the word frequency, the degree of freedom and the degree of solidity of the binary spliced word based on the target word segmentation sequence set, and determining the binary spliced word as a new case means descriptor in response to determining that the binary spliced word meets each condition in a preset new word discovery condition set, wherein the preset new word discovery condition set comprises at least one of the following conditions: the word frequency of the binary spliced word is larger than a preset word frequency threshold value, the degree of solidification of the binary spliced word is larger than a preset degree of solidification threshold value, and the degree of freedom of the binary spliced word is larger than a preset degree of freedom threshold value.

6. The apparatus of claim 5, wherein the performing word segmentation on each recent robbery and fraud history alarm receiving text in the recent robbery and fraud history alarm receiving text set to obtain a corresponding word segmentation sequence comprises:

the device further comprises:

and the adding unit is configured to add each binary concatenation word determined as a new composition means descriptor in the binary concatenation word library into the preset word segmentation dictionary.

7. The apparatus of claim 5 or 6, wherein the preset composition means discovery duration is predetermined by the duration determination step of:

8. The apparatus of claim 7, wherein the calculating, for each binary spliced word in the binary spliced thesaurus, a word frequency, a degree of freedom, and a degree of solidity of the binary spliced word based on the target word sequence set comprises:

Counting the preamble adjacent word setPre_xThe word frequency P (y) of each word y in the target word segmentation sequence set;

Counting Post adjacent word set Post_xThe word frequency p (z) of each word z in the target set of word sequences;

Free(x)＝min(H(Pre_x)，H(Post_x))

9. an electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-4.