CN110399489A - A kind of chat data segmentation method, device and storage medium - Google Patents

A kind of chat data segmentation method, device and storage medium Download PDF

Info

Publication number
CN110399489A
CN110399489A CN201910611047.4A CN201910611047A CN110399489A CN 110399489 A CN110399489 A CN 110399489A CN 201910611047 A CN201910611047 A CN 201910611047A CN 110399489 A CN110399489 A CN 110399489A
Authority
CN
China
Prior art keywords
paragraph
segmenting
staged
sentence
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910611047.4A
Other languages
Chinese (zh)
Other versions
CN110399489B (en
Inventor
陈志明
庄灿波
郑伟斌
苏玉海
赵建强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201910611047.4A priority Critical patent/CN110399489B/en
Publication of CN110399489A publication Critical patent/CN110399489A/en
Application granted granted Critical
Publication of CN110399489B publication Critical patent/CN110399489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of chat data segmentation method and devices, method includes the following steps: S1: carrying out cutting to chat data and obtain multiple pre-staged paragraphs;S2: judge whether pre-staged paragraph is the paragraph comprising different topic contents from the sentence vector similarity of pre-staged paragraph by the feature statement library that topic switches, if, re-segmenting then is carried out to pre-staged paragraph and obtains re-segmenting paragraph, and step S2 is repeated to re-segmenting paragraph, if it is not, then entering in next step;And S3: the paragraph vector similarity by calculating adjacent re-segmenting paragraph judges re-segmenting paragraph with the presence or absence of the relevant adjacent paragraph of content, if, acquisition segmentation paragraph eventually then is merged to re-segmenting paragraph, step S3 is repeated to segmentation paragraph eventually, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.The chat data of different topics can be effectively segmented by the method, improve chat data treatment effeciency.

Description

A kind of chat data segmentation method, device and storage medium
Technical field
The present invention relates to data processing fields, and in particular to a kind of chat data segmentation method, device and storage medium.
Background technique
With the rapid development of social networks, there are the very large chat datas of data volume, and a chat data is by crowd More message composition, it is few then several, more then hundreds of thousands items talk about miscellaneous content.Therefore it needs to so many chat Data are analyzed and processed in the presence of very big difficulty.Such as appoint directly carrying out classification and topic analysis etc. to whole part chat data When business, since chat data length is different in size, topic is abundant, hardly results in ideal effect.So being carried out in chat data When analysis mining, generally require to carry out segment processing to chat data, so that each paragraph length is in a certain range, and the phase Different topic contents is sliced into different paragraphs by prestige, and relevant context is sliced into the same paragraph.
For the segmentation method of chat data, currently used scheme is according to certain time interval or quantity mostly Carry out cutting.These methods may be implemented the control of the length of each paragraph in a certain range, but be easy will be relevant on Ensuing disclosure is sliced into different paragraphs, and is easy for incoherent topic to be sliced into in paragraph.Therefore, at present Segmented mode can not reach desired effect in chat data, the analysis of chat data processing is brought very big tired It disturbs.
In view of this, developing the more systematic more reasonable chat data cutting method of one kind is that need solve the problems, such as One of.
Summary of the invention
It is easy for incoherent topic to be sliced into for traditional chat data segmentation method mentioned above and work as with a paragraph In, and the problems such as relevant context is sliced into different paragraphs, the purpose of embodiments herein is to propose A kind of chat data segmentation method, device and storage medium, solving the technical issues of background section above is mentioned.
In a first aspect, the embodiment of the present application provides a kind of chat data segmentation method, comprising the following steps:
S1: cutting is carried out to chat data and obtains multiple pre-staged paragraphs;
S2: the sentence vector similarity of the feature statement library and pre-staged paragraph that are switched by topic judges pre-staged paragraph It whether is the paragraph comprising different topic contents, if so, carrying out re-segmenting to pre-staged paragraph obtains re-segmenting paragraph, and right Re-segmenting paragraph repeats step S2, if it is not, then entering in next step;And
S3: the paragraph vector similarity by calculating adjacent re-segmenting paragraph judges re-segmenting paragraph with the presence or absence of content Relevant adjacent paragraph repeats to walk if so, merging acquisition segmentation paragraph eventually to re-segmenting paragraph to segmentation paragraph eventually Rapid S3, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
In some embodiments, step S1 is the following steps are included: S11: cutting is arranged by the message number of chat data Threshold value carries out cutting to chat data according to chronological order according to cutting threshold value and obtains pre-staged paragraph;S12: by cutting The chat data that remaining message number or total message number afterwards is less than or equal to cutting threshold value is set as a pre-staged paragraph.
By be arranged cutting threshold value carry out it is pre-staged it is available be tentatively segmented as a result, being conducive to subsequent processing.
In some embodiments, step S2 the following steps are included:
S21: it calculates in the feature statement library of topic switching in the pre-staged paragraph of sentence vector sum of each feature sentence Sentence vector;S22: by the sentence of the feature sentence in the feature statement library of sentence vector and topic switching in pre-staged paragraph Vector carries out similarity-rough set, obtains sentence vector similarity S11,S12…Sij, wherein SijIt is i-th in pre-staged paragraph The similarity of son and j-th of feature sentence in feature statement library;S23: the segmentation that re-segmenting is carried out in pre-staged paragraph is set Threshold value judges sentence vector similarity SijWhether fragmentation threshold is greater than or equal to, if so, with i-th in pre-staged paragraph Sentence carries out re-segmenting to pre-staged paragraph as cut-off rule and obtains re-segmenting paragraph, and repeats step S22 to re-segmenting paragraph And S23, if it is not, then using pre-staged paragraph as re-segmenting paragraph.
The feature statement library switched by setting topic will switch similar sentence as cut-off progress with topic is used for The content of different topics is effectively carried out re-segmenting by paragraph cutting.This is one of the important inventive point of the application.
In some embodiments, step S3 is the following steps are included: S31: calculating the paragraph vector of each re-segmenting paragraph; S32: the paragraph vector similarity s of adjacent re-segmenting paragraph is calculated1,s2,s3…st, wherein paragraph sum is t, smFor m Section and m+1 sections of similarity calculation result;S33: being arranged in the merging threshold that adjacent re-segmenting paragraph merges, judgement Paragraph vector similarity smWhether merging threshold is greater than or equal to, if so, adjacent m sections and m+1 sections are merged It obtains and is segmented paragraph eventually, and step S32 and S33 are repeated to segmentation paragraph eventually, if it is not, then using re-segmenting paragraph as segmentation section eventually It falls.
Paragraphic similarity by comparing adjacent paragraph merges the front and back paragraph containing related content, to reduce section Fall the error of segmentation.This is one of the important inventive point of the application.
In some embodiments, it includes cosine similarity algorithm that similarity calculation, which uses,.
Second aspect, the embodiment of the present application provide a kind of chat data sectioning, comprising:
Pre-staged module is configured as carrying out chat data in the multiple pre-staged paragraphs of cutting acquisition;
It is similar to the sentence vector of pre-staged paragraph to be configured as the feature statement library switched by topic for re-segmenting module Degree judges whether pre-staged paragraph is the paragraph comprising different topic contents, obtains if so, carrying out re-segmenting to pre-staged paragraph Re-segmenting paragraph is obtained, and re-segmenting paragraph is inputted into re-segmenting module again and carries out re-segmenting, if it is not, then entering in next step;And
Merging module is configured as judging re-segmenting section by the paragraph vector similarity for calculating adjacent re-segmenting paragraph It falls with the presence or absence of the relevant adjacent paragraph of content, if so, acquisition segmentation paragraph eventually is merged to re-segmenting paragraph, it will be whole Segmentation paragraph input merging module merges, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
In some embodiments, pre-staged module includes:
Paragraph cutting module is configured as that cutting threshold value is arranged by the message number of chat data, according to cutting threshold value Cutting is carried out to chat data according to chronological order and obtains pre-staged paragraph;
Paragraph classifying module, be configured as by after cutting remaining message number or total message number be less than or equal to cutting The chat data of threshold value is set as a pre-staged paragraph.
In some embodiments, re-segmenting module includes:
Sentence vector calculation module is configured as the sentence of each feature sentence in the feature statement library for calculating topic switching Sentence vector in the pre-staged paragraph of vector sum;
First similarity calculation module is configured as the feature language of sentence vector and topic switching in pre-staged paragraph The sentence vector of feature sentence in sentence library carries out similarity-rough set, obtains sentence vector similarity S11, S12…Sij, wherein Sij For the similarity of j-th of feature sentence in i-th of sentence in pre-staged paragraph and feature statement library;
Re-segmenting computing module is configured as being arranged in the fragmentation threshold for carrying out re-segmenting in pre-staged paragraph, judges sentence Subvector similarity SijWhether fragmentation threshold is greater than or equal to, if so, using i-th of sentence in pre-staged paragraph as segmentation Line carries out re-segmenting to pre-staged paragraph and obtains re-segmenting paragraph, and by re-segmenting paragraph input the first similarity calculation module and Joint account module carries out re-segmenting, if it is not, then using pre-staged paragraph as re-segmenting paragraph.
In some embodiments, merging module includes:
Paragraph vector calculation module is configured as calculating the paragraph vector of each re-segmenting paragraph;
First similarity calculation module is configured as calculating the paragraph vector similarity s of adjacent re-segmenting paragraph1, s2,s3…st, wherein paragraph sum is t, smFor m sections and m+1 sections of similarity calculation result;
Joint account module is configured as that the merging threshold that adjacent re-segmenting paragraph merges is arranged in, judges section Fall vector similarity smWhether it is greater than or equal to merging threshold, is obtained if so, adjacent m sections and m+1 sections are merged It obtains and is segmented paragraph eventually, and to segmentation paragraph repeats paragraph vector calculation module eventually and the first similarity calculation module merges, If it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method as described in implementation any in first aspect when the computer program is executed by processor.
A kind of chat data segmentation method and device provided by the embodiments of the present application are obtained by carrying out cutting to chat data Obtain multiple pre-staged paragraphs;Pre- point of the sentence vector similarity judgement of the feature statement library and pre-staged paragraph that are switched by topic Whether section paragraph is that the paragraph comprising different topic contents carries out re-segmenting, and the paragraph by calculating adjacent re-segmenting paragraph Vector similarity judges that re-segmenting paragraph merges re-segmenting paragraph with the presence or absence of the relevant adjacent paragraph of content.To The segmentation to chat data is realized, the paragraph length after can not only making segmentation in a certain range, and can will be different Topic content be sliced into different paragraphs, relevant context is sliced into the same paragraph, solves previous point Phase method deficiency present in chat data segmentation, can be more applicable for the segmentation of chat data.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow diagram of the chat data segmentation method of the embodiment of the present invention;
Fig. 3 is the flow diagram of the step S1 of the chat data segmentation method of the embodiment of the present invention;
Fig. 4 is the flow diagram of the step S2 of the chat data segmentation method of the embodiment of the present invention;
Fig. 5 is the flow diagram of the step S3 of the chat data segmentation method of the embodiment of the present invention;
Fig. 6 is the schematic diagram of the chat data segmentation side device of the embodiment of the present invention;
Fig. 7 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
Fig. 1, which is shown, can apply the chat data segmentation method of the embodiment of the present application or showing for chat data sectioning Example property system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various applications, such as the application of data processing class, file process can be installed on terminal device 101,102,103 Class application etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, pocket computer on knee and desk-top Computer etc..When terminal device 101,102,103 is software, may be mounted in above-mentioned cited electronic equipment.Its Multiple softwares or software module (such as providing the software of Distributed Services or software module) may be implemented into, it can also be real Ready-made single software or software module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the text that terminal device 101,102,103 uploads The back-end data processing server that part or data are handled.Back-end data processing server can be to the file or data of acquisition It is handled, generates processing result (such as comprising stdtitle row and the corresponding normative document for being segmented the data for including).
It should be noted that chat data segmentation method provided by the embodiment of the present application can be executed by server 105, It can also be executed by terminal device 101,102,103, correspondingly, chat data sectioning can be set in server 105, Also it can be set in terminal device 101,102,103.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.It does not need in handled data from the feelings remotely obtained Under condition, above system framework can not include network, and only need server or terminal device.
With continued reference to Fig. 2, it illustrates a kind of chat data segmentation method according to an embodiment of the present application, this method packets Include following steps:
S1: cutting is carried out to chat data and obtains multiple pre-staged paragraphs.
In the present embodiment, as shown in figure 3, step S1 the following steps are included:
S11: cutting threshold value is arranged by the message number of chat data, according to cutting threshold value according to chronological order pair Chat data carries out cutting and obtains pre-staged paragraph.
Wherein, the pre-staged purpose of paragraph is for tentatively carrying out cutting to chat data, by thousands of chat number According to being cut into the less pre-staged paragraph of message number.There are many pre-staged cutting methods, in a preferred embodiment, passes through Setting cutting threshold value is segmented, cutting threshold value can largest paragraph message number in pre-staged paragraph thus, can also set It is set to the paragraph number of words quantity with certain message number rule.It can be effective as cutting threshold value using largest paragraph message number Chat data is accurately tentatively cut into the pre-staged paragraph with certain message number.
S12: by after cutting remaining message number or total message number be less than or equal to cutting threshold value chat data set It is set to a pre-staged paragraph.
Specifically, the chat number of cutting threshold value will be less than or equal to according to remaining message number after the progress cutting of cutting threshold value One pre- point is classified as according to being classified as a pre-staged paragraph, or by the chat data that total message number is less than or equal to cutting threshold value Section paragraph.
S2: the sentence vector similarity of the feature statement library and pre-staged paragraph that are switched by topic judges pre-staged paragraph It whether is the paragraph comprising different topic contents, if so, carrying out re-segmenting to pre-staged paragraph obtains re-segmenting paragraph, and right Re-segmenting paragraph repeats step S2, if it is not, then entering step S3.
In the present embodiment, as shown in figure 4, step S2 specifically includes the following steps:
S21: it calculates in the feature statement library of topic switching in the pre-staged paragraph of sentence vector sum of each feature sentence Sentence vector.
Wherein, the feature sentence in the feature statement library of topic switching can be collected out by artificial accumulation, can also be with The sentence for topic switching is obtained by neural network screening.In other preferred embodiments, feature sentence can also be adopted It obtains in other ways.
In the present embodiment, sentence vector contains sentence context relation, and sentence vector can directly use doc2vec It is calculated, can also be calculated with bow, tfidf or LDA scheduling algorithm.
S22: by the sentence of the feature sentence in the feature statement library of sentence vector and topic switching in pre-staged paragraph Vector carries out similarity-rough set, obtains sentence vector similarity S11, S12…Sij, wherein SijIt is i-th in pre-staged paragraph The similarity of son and j-th of feature sentence in feature statement library.
Specifically, similarity-rough set is calculated using similarity algorithm, and in a preferred embodiment, similarity-rough set is adopted It uses cosine similarity algorithm: using two vectorial angle cosine values in vector space as measuring the big of two interindividual variations Small, cosine value indicates that two vectors are more similar closer to 1;Cosine value indicates that two vectors are more dissimilar closer to 0.
Assuming that there is the sentence vector a (a in pre-staged paragraph1,a2,a3..., an) and feature statement library sentence vector b (b1, b2, b3..., bn), then cosine similarity S calculation is as follows:
S23: the fragmentation threshold for carrying out re-segmenting in pre-staged paragraph is set, judges sentence vector similarity SijIt is whether big In or be equal to fragmentation threshold, if so, using i-th of sentence in pre-staged paragraph as cut-off rule to pre-staged paragraph progress again Segmentation obtains re-segmenting paragraph, and repeats step S22 and S23 to re-segmenting paragraph, if it is not, then using pre-staged paragraph as dividing again Section paragraph.
In the present embodiment, fragmentation threshold and sentence vector similarity SijBetween comparison can be accurately by pre-staged section Part in falling comprising multiple topic contents yet further carries out cutting.Fragmentation threshold is adjusted according to re-segmenting effect, To obtain better subsection efect.
S3: the paragraph vector similarity by calculating adjacent re-segmenting paragraph judges re-segmenting paragraph with the presence or absence of content Relevant adjacent paragraph repeats to walk if so, merging acquisition segmentation paragraph eventually to re-segmenting paragraph to segmentation paragraph eventually Rapid S3, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
In the present embodiment, as shown in figure 5, step S3 the following steps are included:
S31: the paragraph vector of each re-segmenting paragraph is calculated.
Specifically, paragraph vector contains the context relation of the paragraph, and paragraph vector can be counted directly using doc2vec It obtains, can also be calculated with bow, tfidf or LDA scheduling algorithm.
S32: the paragraph vector similarity s of adjacent re-segmenting paragraph is calculated1,s2,s3…st, wherein paragraph sum is T, smFor m sections and m+1 sections of similarity calculation result.
Specifically, similarity-rough set is calculated using similarity algorithm, and in a preferred embodiment, similarity-rough set is adopted It uses cosine similarity algorithm: using two vectorial angle cosine values in vector space as measuring the big of two interindividual variations Small, cosine value indicates that two vectors are more similar closer to 1;Cosine value indicates that two vectors are more dissimilar closer to 0.
Assuming that having m sections of paragraph vector a and m+1 sections of paragraph vector b, then cosine similarity smCalculation is such as Under:
S33: the merging threshold that adjacent re-segmenting paragraph merges is set, judges paragraph vector similarity smWhether More than or equal to merging threshold, if so, adjacent m sections and m+1 sections are merged acquisition segmentation paragraph eventually, and to end It is segmented paragraph and repeats step S32 and S33, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
In the present embodiment, merging threshold keeps topic relevant for merging the relevant re-segmenting paragraph of adjacent and content Context merges in the same paragraph of segmentation eventually, improves the accuracy of segmentation.
With further reference to Fig. 6, as the realization to method shown in above-mentioned each figure, this application provides a kind of chat datas point One embodiment of section apparatus, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically apply In various electronic equipments.
The device includes:
Pre-staged module 1 is configured as carrying out chat data in the multiple pre-staged paragraphs of cutting acquisition;
Re-segmenting module 2 is configured as the feature statement library and the sentence vector phase of pre-staged paragraph switched by topic Judge whether pre-staged paragraph is the paragraph comprising different topic contents like degree, if so, carrying out re-segmenting to pre-staged paragraph Re-segmenting paragraph is obtained, and re-segmenting paragraph is inputted into re-segmenting module again and carries out re-segmenting, if it is not, then entering in next step;With And
Merging module 3 is configured as judging re-segmenting by the paragraph vector similarity for calculating adjacent re-segmenting paragraph Paragraph whether there is the relevant adjacent paragraph of content, if so, acquisition segmentation paragraph eventually is merged to re-segmenting paragraph, it will Segmentation paragraph input merging module merges eventually, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
In the particular embodiment, pre-staged module 1 includes:
Paragraph cutting module is configured as that cutting threshold value is arranged by the message number of chat data, according to cutting threshold value Cutting is carried out to chat data according to chronological order and obtains pre-staged paragraph;
Paragraph classifying module, be configured as by after cutting remaining message number or total message number be less than or equal to cutting The chat data of threshold value is set as a pre-staged paragraph.
In the particular embodiment, re-segmenting module 2 includes:
Sentence vector calculation module is configured as the sentence of each feature sentence in the feature statement library for calculating topic switching Sentence vector in the pre-staged paragraph of vector sum;
First similarity calculation module is configured as the feature language of sentence vector and topic switching in pre-staged paragraph The sentence vector of feature sentence in sentence library carries out similarity-rough set, obtains sentence vector similarity S11, S12…Sij, wherein Sij For the similarity of j-th of feature sentence in i-th of sentence in pre-staged paragraph and feature statement library;
Re-segmenting computing module is configured as being arranged in the fragmentation threshold for carrying out re-segmenting in pre-staged paragraph, judges sentence Subvector similarity SijWhether fragmentation threshold is greater than or equal to, if so, using i-th of sentence in pre-staged paragraph as segmentation Line carries out re-segmenting to pre-staged paragraph and obtains re-segmenting paragraph, and by re-segmenting paragraph input the first similarity calculation module and Joint account module carries out re-segmenting, if it is not, then using pre-staged paragraph as re-segmenting paragraph.
In the particular embodiment, merging module 3 includes:
Paragraph vector calculation module is configured as calculating the paragraph vector of each re-segmenting paragraph;
First similarity calculation module is configured as calculating the paragraph vector similarity s of adjacent re-segmenting paragraph1, s2,s3…st, wherein paragraph sum is t, smFor m sections and m+1 sections of similarity calculation result;
Joint account module is configured as that the merging threshold that adjacent re-segmenting paragraph merges is arranged in, judges section Fall vector similarity smWhether it is greater than or equal to merging threshold, is obtained if so, adjacent m sections and m+1 sections are merged It obtains and is segmented paragraph eventually, and to segmentation paragraph repeats paragraph vector calculation module eventually and the first similarity calculation module merges, If it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
A kind of chat data segmentation method and device provided by the embodiments of the present application are obtained by carrying out cutting to chat data Obtain multiple pre-staged paragraphs;Pre- point of the sentence vector similarity judgement of the feature statement library and pre-staged paragraph that are switched by topic Whether section paragraph is that the paragraph comprising different topic contents carries out re-segmenting, and the paragraph by calculating adjacent re-segmenting paragraph Vector similarity judges that re-segmenting paragraph merges re-segmenting paragraph with the presence or absence of the relevant adjacent paragraph of content.To The segmentation to chat data is realized, the paragraph length after can not only making segmentation in a certain range, and can will be different Topic content be sliced into different paragraphs, relevant context is sliced into the same paragraph, solves previous point Phase method deficiency present in chat data segmentation, can be more applicable for the segmentation of chat data, improve chat data point The accuracy of section.
Below with reference to Fig. 7, it is (such as shown in FIG. 1 that it illustrates the electronic equipments for being suitable for being used to realize the embodiment of the present application Server or terminal device) computer system 700 structural schematic diagram.Electronic equipment shown in Fig. 7 is only an example, Should not function to the embodiment of the present application and use scope bring any restrictions.
As shown in fig. 7, computer system 700 includes central processing unit (CPU) 701, it can be read-only according to being stored in Program in memory (ROM) 702 or be loaded into the program in random access storage device (RAM) 703 from storage section 708 and Execute various movements appropriate and processing.In RAM 703, also it is stored with system 700 and operates required various programs and data. CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always Line 704.
I/O interface 705 is connected to lower component: the importation 706 including keyboard, mouse etc.;Including such as, liquid crystal Show the output par, c 707 of device (LCD) etc. and loudspeaker etc.;Storage section 708 including hard disk etc.;And including such as LAN The communications portion 709 of the network interface card of card, modem etc..Communications portion 709 is executed via the network of such as internet Communication process.Driver 710, which also can according to need, is connected to I/O interface 707.Detachable media 711, such as disk, CD, Magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 710, in order to from the computer journey read thereon Sequence is mounted into storage section 708 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 709, and/or from detachable media 711 are mounted.When the computer program is executed by central processing unit (CPU) 701, limited in execution the present processes Above-mentioned function.
It should be noted that computer-readable medium described herein can be computer-readable signal media or meter Calculation machine readable medium either the two any combination.Computer-readable medium for example may be-but not limited to- Electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It is computer-readable The more specific example of medium can include but is not limited to: have electrical connection, the portable computer magnetic of one or more conducting wires Disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or sudden strain of a muscle Deposit), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned appoint The suitable combination of meaning.In this application, computer-readable medium can be any tangible medium for including or store program, the journey Sequence can be commanded execution system, device or device use or in connection.And in this application, it is computer-readable Signal media may include in a base band or as carrier wave a part propagate data-signal, wherein carrying computer can The program code of reading.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal or Above-mentioned any appropriate combination.Computer-readable signal media can also be any calculating other than computer-readable medium Machine readable medium, the computer-readable medium can be sent, propagated or transmitted for by instruction execution system, device or device Part uses or program in connection.The program code for including on computer-readable medium can use any Jie appropriate Matter transmission, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: the device includes pre- Segmentation module, re-segmenting module and merging module.Wherein, the title of these modules is not constituted under certain conditions to the module The restriction of itself, for example, pre-staged module is also described as " being segmented chat data according to time and message number Module ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment When row, so that the electronic equipment: S1: carrying out cutting to chat data and obtain multiple pre-staged paragraphs;S2: switched by topic Feature statement library judges whether pre-staged paragraph is comprising different topic contents from the sentence vector similarity of pre-staged paragraph Paragraph if so, carrying out re-segmenting to pre-staged paragraph obtains re-segmenting paragraph, and repeats step S2 to re-segmenting paragraph, if It is no, then enter in next step;And S3: the paragraph vector similarity by calculating adjacent re-segmenting paragraph judges re-segmenting paragraph With the presence or absence of the relevant adjacent paragraph of content, if so, acquisition segmentation paragraph eventually is merged to re-segmenting paragraph, to whole point Section paragraph repeats step S3, if it is not, then using re-segmenting paragraph as segmentation paragraph eventually.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (10)

1. a kind of chat data segmentation method, which comprises the following steps:
S1: cutting is carried out to chat data and obtains multiple pre-staged paragraphs;
S2: the sentence vector similarity of the feature statement library and the pre-staged paragraph that are switched by topic judges described pre-staged Whether paragraph is the paragraph comprising different topic contents, if so, carrying out re-segmenting to the pre-staged paragraph obtains re-segmenting Paragraph, and step S2 is repeated to the re-segmenting paragraph, if it is not, then entering in next step;And
S3: the paragraph vector similarity by calculating the adjacent re-segmenting paragraph judges that the re-segmenting paragraph whether there is The relevant adjacent paragraph of content, if so, acquisition segmentation paragraph eventually is merged to the re-segmenting paragraph, to the end point Section paragraph repeats step S3, if it is not, then using the re-segmenting paragraph as the segmentation paragraph eventually.
2. chat data segmentation method according to claim 1, which is characterized in that the step S1 the following steps are included:
S11: being arranged cutting threshold value by the message number of the chat data, suitable according to time order and function according to the cutting threshold value Chat data described in ordered pair carries out cutting and obtains the pre-staged paragraph;
S12: by after cutting remaining message number or total message number be less than or equal to the cutting threshold value the chat number According to being set as a pre-staged paragraph.
3. chat data segmentation method according to claim 1, which is characterized in that the step S2 the following steps are included:
S21: pre-staged paragraph described in the sentence vector sum of each feature sentence in the feature statement library of the topic switching is calculated In sentence vector;
S22: by the feature sentence in the feature statement library of sentence vector and topic switching in the pre-staged paragraph Sentence vector carries out similarity-rough set, obtains sentence vector similarity S11,S12…Sij, wherein SijFor in the pre-staged paragraph The similarity of j-th of feature sentence in i-th of sentence and the feature statement library;
S23: the fragmentation threshold for carrying out re-segmenting in the pre-staged paragraph is set, judges the sentence vector similarity SijIt is It is no to be greater than or equal to the fragmentation threshold, if so, using i-th of sentence in the pre-staged paragraph as cut-off rule to described Pre-staged paragraph carries out re-segmenting and obtains the re-segmenting paragraph, and repeats step S22 and S23 to the re-segmenting paragraph, if It is no, then using the pre-staged paragraph as the re-segmenting paragraph.
4. chat data segmentation method according to claim 3, which is characterized in that the step S3 the following steps are included:
S31: the paragraph vector of each re-segmenting paragraph is calculated;
S32: the paragraph vector similarity s of the adjacent re-segmenting paragraph is calculated1,s2,s3…st, wherein paragraph is total Number is t, smFor m sections and m+1 sections of similarity calculation result;
S33: the merging threshold that the adjacent re-segmenting paragraph merges is set, judges the paragraph vector similarity sm Whether it is greater than or equal to the merging threshold, obtains the end point if so, adjacent m sections and m+1 sections are merged Section paragraph, and step S32 and S33 are repeated to the paragraph of segmentation eventually, if it is not, then using the re-segmenting paragraph as the end point Section paragraph.
5. chat data segmentation method according to claim 4, which is characterized in that it includes remaining that the similarity calculation, which uses, String similarity algorithm.
6. a kind of chat data sectioning characterized by comprising
Pre-staged module is configured as carrying out chat data in the multiple pre-staged paragraphs of cutting acquisition;
It is similar to the sentence vector of the pre-staged paragraph to be configured as the feature statement library switched by topic for re-segmenting module Degree judges whether the pre-staged paragraph is the paragraph comprising different topic contents, if so, carrying out to the pre-staged paragraph Re-segmenting obtains re-segmenting paragraph, and the re-segmenting paragraph is inputted the re-segmenting module again and carries out re-segmenting, if it is not, then Into in next step;And
Merging module is configured as dividing again by the way that the paragraph vector similarity judgement for calculating the adjacent re-segmenting paragraph is described Section paragraph whether there is the relevant adjacent paragraph of content, if so, merging acquisition segmentation eventually to the re-segmenting paragraph The paragraph of segmentation eventually is inputted the merging module and merged, if it is not, then using the re-segmenting paragraph as described in by paragraph Segmentation paragraph eventually.
7. chat data sectioning according to claim 6, which is characterized in that the pre-staged module includes:
Paragraph cutting module is configured as that cutting threshold value is arranged by the message number of the chat data, according to the cutting Threshold value carries out cutting to the chat data according to chronological order and obtains the pre-staged paragraph;
Paragraph classifying module, be configured as by after cutting remaining message number or total message number be less than or equal to the cutting The chat data of threshold value is set as a pre-staged paragraph.
8. chat data sectioning according to claim 6, which is characterized in that the re-segmenting module includes:
Sentence vector calculation module is configured as the sentence of each feature sentence in the feature statement library for calculating the topic switching Sentence vector in pre-staged paragraph described in vector sum;
First similarity calculation module is configured as the spy of sentence vector and topic switching in the pre-staged paragraph The sentence vector for levying the feature sentence in statement library carries out similarity-rough set, obtains sentence vector similarity S11, S12…Sij, In, SijFor the similarity of j-th of feature sentence in i-th of sentence in the pre-staged paragraph and the feature statement library;
Re-segmenting computing module is configured as being arranged in the fragmentation threshold for carrying out re-segmenting in the pre-staged paragraph, judges institute State sentence vector similarity SijWhether the fragmentation threshold is greater than or equal to, if so, with i-th in the pre-staged paragraph Sentence carries out re-segmenting to the pre-staged paragraph as cut-off rule and obtains the re-segmenting paragraph, and by the re-segmenting paragraph It inputs first similarity calculation module and the joint account module carries out re-segmenting, if it is not, then by described pre-staged section It falls as the re-segmenting paragraph.
9. chat data sectioning according to claim 8, which is characterized in that the merging module includes:
Paragraph vector calculation module is configured as calculating the paragraph vector of each re-segmenting paragraph;
First similarity calculation module is configured as calculating the paragraph vector similarity of the adjacent re-segmenting paragraph s1,s2,s3…st, wherein paragraph sum is t, smFor m sections and m+1 sections of similarity calculation result;
Joint account module is configured as that the merging threshold that the adjacent re-segmenting paragraph merges is arranged in, judges institute State paragraph vector similarity smWhether be greater than or equal to the merging threshold, if so, by adjacent m section with m+1 sections into Row, which merges, obtains the segmentation paragraph eventually, and repeats the paragraph vector calculation module and described first to the paragraph of segmentation eventually Similarity calculation module merges, if it is not, then using the re-segmenting paragraph as the segmentation paragraph eventually.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Such as method as claimed in any one of claims 1 to 5 is realized when being executed by processor.
CN201910611047.4A 2019-07-08 2019-07-08 Chat data segmentation method, device and storage medium Active CN110399489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910611047.4A CN110399489B (en) 2019-07-08 2019-07-08 Chat data segmentation method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910611047.4A CN110399489B (en) 2019-07-08 2019-07-08 Chat data segmentation method, device and storage medium

Publications (2)

Publication Number Publication Date
CN110399489A true CN110399489A (en) 2019-11-01
CN110399489B CN110399489B (en) 2022-06-17

Family

ID=68323952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910611047.4A Active CN110399489B (en) 2019-07-08 2019-07-08 Chat data segmentation method, device and storage medium

Country Status (1)

Country Link
CN (1) CN110399489B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026835A (en) * 2019-12-26 2020-04-17 厦门市美亚柏科信息股份有限公司 Chat subject detection method, device and storage medium
CN112417127A (en) * 2020-12-02 2021-02-26 网易(杭州)网络有限公司 Method, device, equipment and medium for training conversation model and generating conversation
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112804580A (en) * 2020-12-31 2021-05-14 支付宝(杭州)信息技术有限公司 Video dotting method and device
CN113673215A (en) * 2021-07-13 2021-11-19 北京搜狗科技发展有限公司 Text abstract generation method and device, electronic equipment and readable medium
CN114943474A (en) * 2022-06-16 2022-08-26 平安科技(深圳)有限公司 Research and development workload detection method, device, equipment and storage medium
CN115633194A (en) * 2022-12-21 2023-01-20 易方信息科技股份有限公司 Live broadcast playback method and device, computer equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101056188A (en) * 2006-04-10 2007-10-17 腾讯科技(深圳)有限公司 Method and system for pushing the history records to the specified members in the discussion group
US20090129676A1 (en) * 2007-11-20 2009-05-21 Ali Zandifar Segmenting a String Using Similarity Values
CN101621481A (en) * 2008-06-30 2010-01-06 国际商业机器公司 Apparatus and method for keeping corresponding relation between chat logs and chat contents
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
US20170147676A1 (en) * 2015-11-24 2017-05-25 Adobe Systems Incorporated Segmenting topical discussion themes from user-generated posts
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN107392666A (en) * 2017-07-24 2017-11-24 北京奇艺世纪科技有限公司 Advertisement data processing method, device and advertisement placement method and device
CN107562863A (en) * 2017-08-30 2018-01-09 深圳狗尾草智能科技有限公司 Chat robots reply automatic generation method and system
CN107657286A (en) * 2017-10-19 2018-02-02 北京深极智能科技有限公司 A kind of advertisement recognition method and computer-readable recording medium
US20180089373A1 (en) * 2016-09-23 2018-03-29 Driver, Inc. Integrated systems and methods for automated processing and analysis of biological samples, clinical information processing and clinical trial matching
CN107992976A (en) * 2017-12-15 2018-05-04 中国传媒大学 Much-talked-about topic early-stage development trend predicting system and Forecasting Methodology

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101056188A (en) * 2006-04-10 2007-10-17 腾讯科技(深圳)有限公司 Method and system for pushing the history records to the specified members in the discussion group
US20090129676A1 (en) * 2007-11-20 2009-05-21 Ali Zandifar Segmenting a String Using Similarity Values
CN101621481A (en) * 2008-06-30 2010-01-06 国际商业机器公司 Apparatus and method for keeping corresponding relation between chat logs and chat contents
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
US20170147676A1 (en) * 2015-11-24 2017-05-25 Adobe Systems Incorporated Segmenting topical discussion themes from user-generated posts
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
US20180089373A1 (en) * 2016-09-23 2018-03-29 Driver, Inc. Integrated systems and methods for automated processing and analysis of biological samples, clinical information processing and clinical trial matching
CN107392666A (en) * 2017-07-24 2017-11-24 北京奇艺世纪科技有限公司 Advertisement data processing method, device and advertisement placement method and device
CN107562863A (en) * 2017-08-30 2018-01-09 深圳狗尾草智能科技有限公司 Chat robots reply automatic generation method and system
CN107657286A (en) * 2017-10-19 2018-02-02 北京深极智能科技有限公司 A kind of advertisement recognition method and computer-readable recording medium
CN107992976A (en) * 2017-12-15 2018-05-04 中国传媒大学 Much-talked-about topic early-stage development trend predicting system and Forecasting Methodology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENG-XIN LI 等: "Research on methods of filling missing data for multivariate time series", 《2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA)》 *
陈虹枢: "基于主题模型的专利文本挖掘方法及应用研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026835A (en) * 2019-12-26 2020-04-17 厦门市美亚柏科信息股份有限公司 Chat subject detection method, device and storage medium
CN111026835B (en) * 2019-12-26 2022-06-10 厦门市美亚柏科信息股份有限公司 Chat subject detection method, device and storage medium
CN112417127A (en) * 2020-12-02 2021-02-26 网易(杭州)网络有限公司 Method, device, equipment and medium for training conversation model and generating conversation
CN112417127B (en) * 2020-12-02 2023-08-22 网易(杭州)网络有限公司 Dialogue model training and dialogue generation methods, devices, equipment and media
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112804580A (en) * 2020-12-31 2021-05-14 支付宝(杭州)信息技术有限公司 Video dotting method and device
CN113673215A (en) * 2021-07-13 2021-11-19 北京搜狗科技发展有限公司 Text abstract generation method and device, electronic equipment and readable medium
CN114943474A (en) * 2022-06-16 2022-08-26 平安科技(深圳)有限公司 Research and development workload detection method, device, equipment and storage medium
CN115633194A (en) * 2022-12-21 2023-01-20 易方信息科技股份有限公司 Live broadcast playback method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110399489B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN110399489A (en) A kind of chat data segmentation method, device and storage medium
US11314920B2 (en) Time-driven placement and/or cloning of components for an integrated circuit
US20210150315A1 (en) Fusing Multimodal Data Using Recurrent Neural Networks
CN109446099A (en) Automatic test cases generation method, device, medium and electronic equipment
CN107609890A (en) A kind of method and apparatus of order tracking
CN108256070A (en) For generating the method and apparatus of information
CN110119445A (en) The method and apparatus for generating feature vector and text classification being carried out based on feature vector
CN112650841A (en) Information processing method and device and electronic equipment
CN107908662B (en) Method and device for realizing search system
CN109993179A (en) The method and apparatus that a kind of pair of data are clustered
CN109146152A (en) Incident classification prediction technique and device on a kind of line
CN116245670B (en) Method, device, medium and equipment for processing financial tax data based on double-label model
CN110390063A (en) A kind of data analysis method, device, medium and electronic equipment
US10482162B2 (en) Automatic equation transformation from text
CN109284367A (en) Method and apparatus for handling text
US20210349920A1 (en) Method and apparatus for outputting information
CN110188113A (en) Method, device and storage medium for comparing data by using complex expression
CN107291835A (en) A kind of recommendation method and apparatus of search term
CN108062423B (en) Information-pushing method and device
CN113869042A (en) Text title generation method and device, electronic equipment and storage medium
CN110390011A (en) The method and apparatus of data classification
CN112100291A (en) Data binning method and device
CN112306964A (en) Metadata-based scientific data characterization driven on a large scale by knowledge databases
CN114445179A (en) Service recommendation method and device, electronic equipment and computer readable medium
CN113392215A (en) Training method of production problem classification model, and production problem classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant