CN109241296A - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN109241296A
CN109241296A CN201811075458.8A CN201811075458A CN109241296A CN 109241296 A CN109241296 A CN 109241296A CN 201811075458 A CN201811075458 A CN 201811075458A CN 109241296 A CN109241296 A CN 109241296A
Authority
CN
China
Prior art keywords
search term
vocabulary
lexical set
title text
target vocabulary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811075458.8A
Other languages
Chinese (zh)
Inventor
邓江东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811075458.8A priority Critical patent/CN109241296A/en
Publication of CN109241296A publication Critical patent/CN109241296A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes: the search term for obtaining user's input in preset time period;For the search term in acquired search term, which is identified, to determine whether the search term includes target vocabulary, and in response to determination includes that target vocabulary is added in default lexical set;Based on the default lexical set after addition target vocabulary, new lexical set is generated.This embodiment improves the comprehensive of generated, new lexical set, facilitate the content of abundant lexical set.

Description

Method and apparatus for generating information
Technical field
The invention relates to field of computer technology, more particularly, to generate the method and apparatus of information.
Background technique
In general, participle refers to Chinese word segmentation.By participle, a chinese character sequence can be cut into one or more words Language.
Currently, the method for participle has very much.Wherein, it has been obtained extensively using the method that the dictionary pre-established is segmented Application.Here, the dictionary pre-established is normally based on the dictionary that the content in early stage People's Daily obtains.
Summary of the invention
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method comprises: obtaining default The search term that user inputs in period;For the search term in acquired search term, which is identified, with true Whether the fixed search term includes target vocabulary, and in response to determination includes that target vocabulary is added in default lexical set; Based on the default lexical set after addition target vocabulary, new lexical set is generated.
In some embodiments, for the search term in acquired search term, which corresponds at least one title Text, wherein title text inputs the text clicked after search term by user;And the search term is identified, comprising: Obtain at least one title text corresponding to the search term;For the title text at least one title text, to the mark Topic text and the search term are matched.
In some embodiments, which is identified, to determine whether the search term includes target vocabulary, packet It includes: the search term is segmented, obtain sequence of words;It include at least two words in response to determination sequence of words obtained Converge, for the vocabulary at least two vocabulary, execute following steps: determine the vocabulary and in sequence of words it is adjacent with the vocabulary Vocabulary corresponding to incidence coefficient, wherein incidence coefficient is used to characterize the association of the vocabulary and the vocabulary adjacent with the vocabulary Degree;In response to determining that identified incidence coefficient more than or equal to preset threshold, determines that the search term includes target vocabulary, In, target vocabulary is the vocabulary of the vocabulary and adjacent with vocabulary vocabulary synthesis in sequence of words.
In some embodiments, which is identified, to determine whether the search term includes target vocabulary, packet It includes: Entity recognition being named to the search term, obtains recognition result, wherein recognition result is used to indicate whether search term wraps Target vocabulary is included, target vocabulary is name entity.
In some embodiments, Entity recognition is named to the search term, obtains recognition result, comprising: using in advance Trained Named Entity Extraction Model is named Entity recognition to the search term, obtains recognition result.
Second aspect, the embodiment of the present application provide a kind of method for participle, this method comprises: obtaining user's input Search term;It is right based on the new lexical set generated using the method as described in any embodiment in above-mentioned first aspect Acquired search term is segmented, and word segmentation result is obtained.
The third aspect, the embodiment of the present application provide a kind of for generating the device of information, which includes: the first acquisition Unit is configured to obtain the search term that user inputs in preset time period;Recognition unit is configured to search acquired Search term in rope word identifies the search term, to determine whether the search term includes target vocabulary, and in response to true It surely include that target vocabulary is added in default lexical set;Generation unit, be configured to will based on addition target vocabulary after Default lexical set, generates new lexical set.
In some embodiments, for the search term in acquired search term, which corresponds at least one title Text, wherein title text inputs the text clicked after search term by user;And recognition unit includes: acquisition module, quilt It is configured to obtain at least one title text corresponding to the search term for the search term in acquired search term;Matching Module is configured to match the title text and the search term title text at least one title text.
In some embodiments, recognition unit includes: word segmentation module, is configured to for searching in acquired search term Rope word segments the search term, obtains sequence of words;Execution module is configured in response to determine vocabulary obtained Sequence includes at least two vocabulary, for the vocabulary at least two vocabulary, executes following steps: determining the vocabulary and in vocabulary Incidence coefficient corresponding to the vocabulary adjacent with the vocabulary in sequence, wherein incidence coefficient for characterize the vocabulary and with the word Converge the correlation degree of adjacent vocabulary;In response to determining that identified incidence coefficient is more than or equal to preset threshold, the search is determined Word includes target vocabulary, wherein target vocabulary is the word of the vocabulary and adjacent with vocabulary vocabulary synthesis in sequence of words It converges.
In some embodiments, recognition unit is further configured to: right for the search term in acquired search term The search term is named Entity recognition, obtains recognition result, wherein recognition result is used to indicate whether search term includes target Vocabulary, target vocabulary are name entity.
In some embodiments, Entity recognition is named to the search term, obtains recognition result, comprising: using in advance Trained Named Entity Extraction Model is named Entity recognition to the search term, obtains recognition result.
Fourth aspect, the embodiment of the present application provide a kind of device for participle, which includes: the second acquisition list Member is configured to obtain the search term of user's input;Participle unit is configured to based on using any in such as above-mentioned first aspect The new lexical set that method described in embodiment generates, segments acquired search term, obtains word segmentation result.
5th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors;Storage dress Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method as described in any embodiment in above-mentioned first aspect and second aspect.
6th aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method as described in any embodiment in above-mentioned first aspect and second aspect is realized when program is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information, it is defeated by obtaining user in preset time period The search term entered identifies the search term then for the search term in acquired search term, obtains target vocabulary, Target vocabulary obtained is added in default lexical set, new lexical set is obtained, to efficiently use user's input Search term obtain new lexical set, improve the comprehensive of generated, new lexical set, facilitate abundant word finder The content of conjunction.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for generating information of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for generating information of the application;
Fig. 4 is the flow chart according to another embodiment of the method for generating information of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the application;
Fig. 6 is the flow chart according to the application for one embodiment of the method for participle;
Fig. 7 is the structural schematic diagram according to the application for one embodiment of the device of participle;
Fig. 8 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for generating information that can apply the application, the device for generating information, for dividing The exemplary system architecture 100 of the method for word or the embodiment of the device for participle.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as Language Processing software, Web browser applications, shopping class application, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with display screen and supported web page browsing, including but not limited to smart phone, plate Computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic Image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, move State image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal is set Standby 101,102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or Software module (such as providing multiple softwares of Distributed Services or software module), also may be implemented into single software or soft Part module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as various to installing in terminal 101,102,103 The background server supported using offer.Background server can use the transmission of terminal 101,102,103, user's input search Rope word generates new lexical set.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software It, can also be with to be implemented as multiple softwares or software module (such as providing multiple softwares of Distributed Services or software module) It is implemented as single software or software module.It is not specifically limited herein.
It should be noted that the method provided by the embodiment of the present application for generating the method for information and for participle It can be executed, can also be executed by terminal device 101,102,103 by server 105.Correspondingly, for generating the device of information And the device for participle can be set in server 105, also can be set in terminal device 101,102,103.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the application is shown 200.The method for being used to generate information, comprising the following steps:
Step 201, the search term of user's input in preset time period is obtained.
In the present embodiment, can lead to for generating the executing subject (such as server shown in FIG. 1) of the method for information It crosses wired connection mode or radio connection obtains the search term of user's input in preset time period.Wherein, preset time Section can be technical staff's pre-set period, such as on January 1,1 day to 2018 January in 2017.User's input is searched Rope word using above-mentioned executing subject or can utilize terminal (such as Fig. 1 institute communicated to connect with above-mentioned executing subject for user The terminal device shown) input search term.User can be for corresponding to each terminal for communicating to connect with above-mentioned executing subject User.Search term can be the vocabulary, phrase or sentence etc. for search.
Step 202, for the search term in acquired search term, which is identified, to determine the search Whether word includes target vocabulary, and in response to determination includes that target vocabulary is added in default lexical set.
In the present embodiment, for the search term in search term acquired obtained in step 201, above-mentioned executing subject The search term can be identified, to determine whether the search term includes target vocabulary, and include in response to determination, by mesh Mark vocabulary is added in default lexical set.Wherein, target vocabulary is the vocabulary for being added in default lexical set.It is default Lexical set predetermined, lexical set for participle for technical staff.
In the present embodiment, for the search term in acquired search term, above-mentioned executing subject can use various sides Method identifies the search term, to determine whether the search term includes target vocabulary.
It is above-mentioned to hold for the search term in acquired search term in some optional implementations of the present embodiment Row main body can identify the search term by step identified below, to determine whether the search term includes target vocabulary:
Step 2021, which is segmented, obtains sequence of words.
Wherein, sequence of words can be used for characterizing the result of participle.Specifically, the vocabulary in sequence of words is the search term In vocabulary.Vocabulary in sequence of words according to the vocabulary in search term the arrangement that puts in order.
Herein, above-mentioned executing subject can segment the search term using various methods, obtain sequence of words.Example Such as, above-mentioned executing subject can use default lexical set, is segmented, is obtained to the search term using maximum forward matching algorithm Obtain sequence of words;The search term is segmented alternatively, above-mentioned executing subject can use preparatory trained participle model, is obtained Obtain sequence of words.
Specifically, as an example, search term is " what is computer ".Above-mentioned executing subject segments the search term What afterwards, can obtain sequence of words " ";"Yes";" computer ".
It step 2022, include at least two vocabulary in response to determination sequence of words obtained, at least two vocabulary In vocabulary, execute following steps: determine association corresponding to the vocabulary and adjacent with vocabulary vocabulary in sequence of words Coefficient;In response to determining that identified incidence coefficient is more than or equal to preset threshold, determine that the search term includes target vocabulary.
Wherein, target vocabulary is the vocabulary of the vocabulary and adjacent with vocabulary vocabulary synthesis in sequence of words.Association Coefficient is used to characterize the correlation degree of the vocabulary and the vocabulary adjacent with the vocabulary.Preset threshold can be set in advance for technical staff The numerical value set.
Specifically, above-mentioned executing subject can determine the word using various methods for the vocabulary at least two vocabulary Incidence coefficient corresponding to remittance and adjacent with vocabulary vocabulary in sequence of words.For example, above-mentioned executing subject is available Pre-set text set, and be based on pre-set text set, using a mutual information (Pointwise Mutual Information, PMI method), calculates the point mutual information of the vocabulary and adjacent with vocabulary vocabulary in sequence of words, and by calculated result It is determined as incidence coefficient corresponding to the vocabulary and adjacent with vocabulary vocabulary in sequence of words.Wherein, pre-set text collection Various texts that pre-set text in conjunction can collect in advance for technical staff, for determining incidence coefficient, such as user are defeated The search term that enters, the article in website, news in newspaper etc..
It should be noted that point mutual information method is the well-known technique studied and applied extensively at present, details are not described herein again.
Optionally, for the vocabulary at least two vocabulary, above-mentioned executing subject can also be determined by following steps should Incidence coefficient corresponding to vocabulary and adjacent with vocabulary vocabulary in sequence of words: firstly, above-mentioned executing subject can incite somebody to action The vocabulary and adjacent with vocabulary vocabulary in sequence of words are merged into a candidate vocabulary;Then, above-mentioned executing subject can The number occurred in above-mentioned pre-set text set with the determining candidate vocabulary being merged into;Finally, the number based on appearance, determines Incidence coefficient corresponding to the vocabulary and adjacent with vocabulary vocabulary in sequence of words.
Herein, above-mentioned executing subject can use number of the various methods based on appearance, determine the vocabulary and in vocabulary Incidence coefficient corresponding to the vocabulary adjacent with the vocabulary in sequence.For example, the number of appearance can be determined directly as being associated with Coefficient;Alternatively, can carry out asking quotient to the number of the pre-set text in the number and pre-set text set of appearance, acquisition asks quotient to tie Fruit, and quotient's result will be asked to be determined as incidence coefficient.
It is above-mentioned to hold for the search term in acquired search term in some optional implementations of the present embodiment Row main body can also identify the search term by following steps, to determine whether the search term includes target vocabulary: right The search term is named Entity recognition, obtains recognition result.Wherein, recognition result is used to indicate whether search term includes target Vocabulary, and when it includes target vocabulary that recognition result, which is used to indicate search term, recognition result may include target vocabulary.Here, Target vocabulary is to name entity.Name entity refers to name, mechanism name, place name and other are all with entitled mark Entity.In this implementation, entity refers to vocabulary.
Specifically, above-mentioned executing subject can be named Entity recognition to search term using various methods, it is somebody's turn to do with determining Whether search term includes target vocabulary.For example, technical staff can pre-establish name entity sets, then above-mentioned executing subject Name entity in search term and name entity sets can be matched, to determine whether search term includes name entity, Obtain recognition result.
It is above-mentioned to hold for the search term in acquired search term in some optional implementations of the present embodiment Row main body can use Named Entity Extraction Model trained in advance and identify to the search term, obtain recognition result.Wherein, Named Entity Extraction Model can be based on the existing various model (such as CRF (Conditional for carrying out Language Processing Random Field, condition random field), HMM (Hidden Markov Model, hidden Markov model) etc.) training obtains. It should be noted that the method that training obtains Named Entity Extraction Model is the well-known technique of extensive research and application at present, this Place repeats no more.
Step 203, based on the default lexical set after addition target vocabulary, new lexical set is generated.
In the present embodiment, above-mentioned executing subject can be generated new based on the default lexical set after addition target vocabulary Lexical set.
Specifically, the default lexical set after addition target vocabulary directly can be determined as new word by above-mentioned executing subject Collect conjunction, the default lexical set after addition target vocabulary can also be handled, and will treated default lexical set It is determined as new lexical set.Here, it can be technical staff to the processing of the default lexical set after addition target vocabulary to refer to Whether the default lexical set after fixed various processing, such as identification addition target vocabulary includes identical vocabulary, right if including In identical vocabulary, retains a vocabulary, delete the processing of other vocabulary.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment Figure.In the application scenarios of Fig. 3, server 301 can obtain first terminal 302 transmission, in this day on May 9th, 2018 The search term of interior user's input, such as get search term " what is anger rancour " 3031 and search term " weather today " 3032.So Afterwards, for search term 3031, server 301 can be identified the search term, determine that the search term includes target vocabulary 3041 (such as anger rancours), and then target vocabulary 3041 is added in default lexical set 305.For search term 3032, service Device 301 can identify the search term, determine that the search term does not include target vocabulary, and then not execute subsequent by target Vocabulary is added to the step in default lexical set 305.Finally, after server 301 can be based on addition target vocabulary 3041 Default lexical set 305, generates new lexical set 306.
The search term that the method provided by the above embodiment of the application can be inputted based on user obtains new lexical set, The comprehensive of generated, new lexical set is improved, the content of abundant lexical set is facilitated.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use In the process 400 for the method for generating information, comprising the following steps:
Step 401, the search term of user's input in preset time period is obtained.
In the present embodiment, can lead to for generating the executing subject (such as server shown in FIG. 1) of the method for information It crosses wired connection mode or radio connection obtains the search term of user's input in preset time period.Wherein, preset time Section can be technical staff's pre-set period, such as on January 1,1 day to 2018 January in 2017.User's input is searched Rope word using above-mentioned executing subject or can utilize terminal (such as Fig. 1 institute communicated to connect with above-mentioned executing subject for user The terminal device shown) input search term.User can be for corresponding to each terminal for communicating to connect with above-mentioned executing subject User.Search term can be the vocabulary, phrase or sentence etc. for search.
Step 402, for the search term in acquired search term, following steps is executed: being obtained corresponding to the search term At least one title text;For the title text at least one title text, to the title text and the search term into Row matching, to determine whether the search term includes target vocabulary;Include in response to determination, target vocabulary is added to default vocabulary In set.
In the present embodiment, for the search term in acquired search term, which corresponds at least one heading-text This, wherein title text inputs the text clicked after search term by user.In turn, for searching in acquired search term Rope word, above-mentioned executing subject can execute following steps:
Step 4021, at least one title text corresponding to the search term is obtained.
It is appreciated that multiple title texts can be clicked after user inputs a search term, therefore herein, the search term At least one title text can be corresponded to.
In practice, the search term of user's input and user can be inputted the mark clicked after search term by above-mentioned executing subject Inscribe textual association storage, in turn, above-mentioned executing subject can be based on the search term, from it is local obtain corresponding to the search term to A few title text.
Step 4022, for the title text at least one title text, the title text and the search term are carried out Matching, to determine whether the search term includes target vocabulary.
Wherein, the vocabulary that target vocabulary is the search term and the title text includes simultaneously.
Specifically, above-mentioned executing subject can use various methods for the title text at least one title text The title text and the search term are matched, to determine whether the search term includes target vocabulary.For example, can be to the mark It inscribes text and the search term carries out the solution of Longest Common Substring (Longest Common Substring, LCS), if solving Longest Common Substring can then determine that the search term includes target vocabulary.Here, target vocabulary is that solved longest is public Substring altogether.Longest Common Substring refers to two character strings while include, longest substring.It should be noted that asking The method of solution Longest Common Substring is the well-known technique studied and applied extensively at present, and details are not described herein again.
It step 4023, include that target vocabulary is added in default lexical set in response to determination.
Wherein, presetting lexical set is that technical staff is predetermined, lexical set for participle.
Step 403, based on the default lexical set after addition target vocabulary, new lexical set is generated.
In the present embodiment, above-mentioned executing subject can be generated new based on the default lexical set after addition target vocabulary Lexical set.
Above-mentioned steps 401, step 403 are consistent with step 201, the step 203 in previous embodiment respectively, above with respect to step Rapid 201 and the description of step 203 be also applied for step 401 and step 403, details are not described herein again.
Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 the step of highlighting using at least one title text corresponding to search term, determining target vocabulary.As a result, originally Embodiment provides another scheme for being used to generate new lexical set, facilitates the content for further enriching lexical set, Improve the diversity of information generation.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, the device 500 for generating information of the present embodiment includes: first acquisition unit 501, identification list Member 502 and generation unit 503.Wherein, first acquisition unit 501 is configured to obtain the search that user inputs in preset time period Word;Recognition unit 502 is configured to identify the search term, with determination the search term in acquired search term Whether the search term includes target vocabulary, and in response to determination includes that target vocabulary is added in default lexical set.It is raw It is configured to that new lexical set will be generated based on the default lexical set after addition target vocabulary at unit 503.
In the present embodiment, wired connection mode can be passed through for generating the first acquisition unit 501 of the device of information Or radio connection obtains the search term of user's input in preset time period.Wherein, preset time period can be technology people Member's pre-set period, such as on January 1,1 day to 2018 January in 2017.The search term of user's input can be user Using device 500 or utilize the search of terminal (such as terminal device shown in FIG. 1) input communicated to connect with device 500 Word.User can be user corresponding to each terminal for communicating to connect with device 500.Search term can be the word for search Remittance, phrase or sentence etc..
In the present embodiment, the search term in search term obtained for first acquisition unit 501, recognition unit 502 can To be identified to the search term, to determine whether the search term includes target vocabulary, and include in response to determination, by target Vocabulary is added in default lexical set.Wherein, target vocabulary is the vocabulary for being added in default lexical set.Default word Collect and is combined into that technical staff is predetermined, lexical set for participle.
In the present embodiment, for the search term in acquired search term, recognition unit 502 can use various methods The search term is identified, to determine whether the search term includes target vocabulary.
In the present embodiment, generation unit 503 can be generated new based on the default lexical set after addition target vocabulary Lexical set.
Specifically, the default lexical set after addition target vocabulary directly can be determined as new word by generation unit 503 Collect conjunction, the default lexical set after addition target vocabulary can also be handled, and will treated default lexical set It is determined as new lexical set.
In some optional implementations of the present embodiment, for the search term in acquired search term, the search Word corresponds at least one title text, wherein title text inputs the text clicked after search term by user;And identification is single Member 502 may include: to obtain module (not shown), be configured to obtain the search term in acquired search term At least one title text corresponding to the search term;Matching module (not shown) is configured to mark at least one The title text in text is inscribed, the title text and the search term are matched.
In some optional implementations of the present embodiment, recognition unit 502 may include: word segmentation module (in figure not Show), it is configured to segment the search term search term in acquired search term, obtains sequence of words;It holds Row module (not shown) is configured in response to determine that sequence of words obtained includes at least two vocabulary, for extremely Vocabulary in few two vocabulary, executes following steps: determining the vocabulary and adjacent with vocabulary vocabulary institute in sequence of words Corresponding incidence coefficient, wherein incidence coefficient is used to characterize the correlation degree of the vocabulary and the vocabulary adjacent with the vocabulary;Response The incidence coefficient determined by determining is more than or equal to preset threshold, determines that the search term includes target vocabulary, wherein target vocabulary For the vocabulary of the vocabulary and adjacent with vocabulary vocabulary synthesis in sequence of words.
In some optional implementations of the present embodiment, recognition unit 502 can be further configured to: for institute Search term in the search term of acquisition is named Entity recognition to the search term, obtains recognition result, wherein recognition result It is used to indicate whether search term includes target vocabulary, target vocabulary is name entity.
In some optional implementations of the present embodiment, Entity recognition is named to the search term, is identified As a result, comprising: Entity recognition is named to the search term using Named Entity Extraction Model trained in advance, obtains identification knot Fruit.
It is understood that all units recorded in the device 500 and each step phase in the method with reference to Fig. 2 description It is corresponding.As a result, above with respect to the operation of method description, the beneficial effect of feature and generation be equally applicable to device 500 and its In include unit, details are not described herein.
The search term that the device provided by the above embodiment 500 of the application can be inputted based on user obtains new word finder It closes, improves the comprehensive of generated, new lexical set, facilitate the content of abundant lexical set.
Fig. 6 is referred to, it illustrates the processes 600 of one embodiment of the method provided by the present application for participle.It should Method for participle may comprise steps of:
Step 601, the search term of user's input is obtained.
In the present embodiment, can pass through for the executing subject of the method for participle (such as server 105 shown in FIG. 1) Wired connection type or wireless connection type obtain the search term of user's input.Wherein, search term is for search Vocabulary, phrase or sentence etc..User input search term can for user using above-mentioned executing subject or using with it is above-mentioned The search term of terminal (such as terminal device shown in FIG. 1) input of executing subject communication connection.
Step 602, based on new lexical set, acquired search term is segmented, obtains word segmentation result.
In the present embodiment, above-mentioned executing subject can be based on new lexical set, to the search term obtained in step 601 It is segmented, to obtain word segmentation result.Wherein, word segmentation result includes the vocabulary that participle obtains.Specifically, as an example, dividing Sequence of words composed by the vocabulary that word result can obtain for participle.Vocabulary in sequence of words can be searched according to acquired The arrangement that puts in order of vocabulary in rope word.New lexical set can be using the method as described in above-mentioned Fig. 2 embodiment And generate.Specific generating process may refer to the associated description of Fig. 2 embodiment, and details are not described herein.
Specifically, above-mentioned executing subject can be based on new lexical set, using various methods to acquired search term It is segmented, obtains word segmentation result.For example, can be using maximum forward matching algorithm, maximum reverse matching algorithm, minimum forward direction Matching algorithm or minimum reverse matching algorithm etc., segment acquired search term, obtain word segmentation result.
It should be noted that the segmenting method based on lexical set is the well-known technique studied and applied extensively at present, this Place repeats no more.
The search term that the method provided by the above embodiment of the application can input user based on new lexical set into Row participle, improves the accuracy of participle.
With continued reference to Fig. 7, as the realization to method shown in above-mentioned Fig. 6, this application provides a kind of dresses for participle The one embodiment set.The Installation practice is corresponding with embodiment of the method shown in fig. 6, which specifically can be applied to respectively In kind electronic equipment.
As shown in fig. 7, the device 700 for participle of the present embodiment may include: that second acquisition unit 701 and participle are single Member 702.Wherein, second acquisition unit 701 is configured to obtain the search term of user's input;Participle unit 702 is configured to base In the new lexical set generated using the method as described in above-mentioned Fig. 2 embodiment, acquired search term is segmented, Obtain word segmentation result.
In the present embodiment, for the second acquisition unit 701 of the device of participle can by wired connection type or The connection type that person is wireless obtains the search term of user's input.Wherein, search term is the vocabulary, phrase or sentence for search Deng.User input search term can for user using device 700 or using communicated to connect with device 700 terminal (such as Terminal device shown in FIG. 1) input search term.
In the present embodiment, participle unit 702 can obtain second acquisition unit 701 based on new lexical set Search term is segmented, to obtain word segmentation result.Wherein, word segmentation result includes the vocabulary that participle obtains.
Specifically, participle unit 702 can based on new lexical set, using various methods to acquired search term into Row participle, obtains word segmentation result.
It is understood that all units recorded in the device 700 and each step phase in the method with reference to Fig. 6 description It is corresponding.As a result, above with respect to the operation of method description, the beneficial effect of feature and generation be equally applicable to device 700 and its In include unit, details are not described herein.
The search term that the device provided by the above embodiment of the application can input user based on new lexical set into Row participle, improves the accuracy of participle.
Below with reference to Fig. 8, it is (such as shown in FIG. 1 that it illustrates the electronic equipments for being suitable for being used to realize the embodiment of the present application Terminal device or server) computer system 800 structural schematic diagram.Electronic equipment shown in Fig. 8 is only an example, Should not function to the embodiment of the present application and use scope bring any restrictions.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data. CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always Line 804.
I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 808 including hard disk etc.; And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon Computer program be mounted into storage section 808 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media 811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination. The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include first acquisition unit, recognition unit and generation unit.Wherein, the title of these units under certain conditions constitute pair The restriction of the unit itself, for example, first acquisition unit is also described as " obtaining the unit of the search term of user's input ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment When row, so that the electronic equipment: obtaining the search term of user's input in preset time period;For searching in acquired search term Rope word identifies the search term, to determine whether the search term includes target vocabulary, and includes in response to determination, will Target vocabulary is added in default lexical set;Based on the default lexical set after addition target vocabulary, new word finder is generated It closes.
In addition, when said one or multiple programs are executed by the electronic equipment, it is also possible that the electronic equipment: obtaining Take the search term of family input;Based on being generated using the method as described in the various embodiments described above for generating information, new Lexical set, acquired search term is segmented, obtain word segmentation result.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of method for generating information, comprising:
Obtain the search term of user's input in preset time period;
For the search term in acquired search term, which is identified, to determine whether the search term includes mesh Vocabulary is marked, and in response to determination includes that target vocabulary is added in default lexical set;
Based on the default lexical set after addition target vocabulary, new lexical set is generated.
2. according to the method described in claim 1, wherein, for the search term in acquired search term, the search term is corresponding At least one title text, wherein title text inputs the text clicked after search term by user;And
The described pair of search term identifies, comprising:
Obtain at least one title text corresponding to the search term;
For the title text at least one described title text, the title text and the search term are matched.
3. according to the method described in claim 1, wherein, the described pair of search term identifies, whether to determine the search term Including target vocabulary, comprising:
The search term is segmented, sequence of words is obtained;
It include at least two vocabulary in response to determination sequence of words obtained, for the vocabulary at least two vocabulary, It executes following steps: determining incidence coefficient corresponding to the vocabulary and adjacent with vocabulary vocabulary in the sequence of words, Wherein, incidence coefficient is used to characterize the correlation degree of the vocabulary and the vocabulary adjacent with the vocabulary;In response to determined by determination Incidence coefficient is more than or equal to preset threshold, determines that the search term includes target vocabulary, wherein target vocabulary is for the vocabulary and in institute State the vocabulary of vocabulary synthesis adjacent with the vocabulary in sequence of words.
4. according to the method described in claim 1, wherein, the described pair of search term identifies, whether to determine the search term Including target vocabulary, comprising:
Entity recognition is named to the search term, obtains recognition result, wherein recognition result is used to indicate whether search term wraps Target vocabulary is included, target vocabulary is name entity.
5. obtaining identification knot according to the method described in claim 4, wherein, the described pair of search term is named Entity recognition Fruit, comprising:
Entity recognition is named to the search term using Named Entity Extraction Model trained in advance, obtains recognition result.
6. a kind of method for participle, comprising:
Obtain the search term of user's input;
Based on using as described in one of claim 1-5 method generation new lexical set, to acquired search term into Row participle, obtains word segmentation result.
7. a kind of for generating the device of information, comprising:
First acquisition unit is configured to obtain the search term that user inputs in preset time period;
Recognition unit is configured to identify the search term search term in acquired search term, is somebody's turn to do with determining Whether search term includes target vocabulary, and in response to determination includes that target vocabulary is added in default lexical set;
Generation unit is configured to that new lexical set will be generated based on the default lexical set after addition target vocabulary.
8. device according to claim 7, wherein for the search term in acquired search term, the search term is corresponding At least one title text, wherein title text inputs the text clicked after search term by user;And
The recognition unit includes:
Module is obtained, is configured to obtain at least one corresponding to the search term search term in acquired search term A title text;
Matching module, is configured to for the title text at least one described title text, to the title text and this search Rope word is matched.
9. device according to claim 7, wherein the recognition unit includes:
Word segmentation module is configured to segment the search term search term in acquired search term, obtains vocabulary Sequence;
Execution module is configured in response to determine that sequence of words obtained includes at least two vocabulary, for it is described at least Vocabulary in two vocabulary executes following steps: determining the vocabulary and adjacent with vocabulary vocabulary in the sequence of words Corresponding incidence coefficient, wherein incidence coefficient is used to characterize the correlation degree of the vocabulary and the vocabulary adjacent with the vocabulary;It rings Preset threshold should be more than or equal in determining identified incidence coefficient, determine that the search term includes target vocabulary, wherein target word It converges for the vocabulary of the vocabulary and adjacent with vocabulary vocabulary synthesis in the sequence of words.
10. device according to claim 7, wherein the recognition unit is further configured to:
For the search term in acquired search term, Entity recognition is named to the search term, obtains recognition result, In, recognition result is used to indicate whether search term includes target vocabulary, and target vocabulary is name entity.
11. device according to claim 10, wherein the described pair of search term is named Entity recognition, is identified As a result, comprising:
Entity recognition is named to the search term using Named Entity Extraction Model trained in advance, obtains recognition result.
12. a kind of device for participle, comprising:
Second acquisition unit is configured to obtain the search term of user's input;
Participle unit is configured to based on the new lexical set generated using the method as described in one of claim 1-5, right Acquired search term is segmented, and word segmentation result is obtained.
13. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method as claimed in any one of claims 1 to 6.
CN201811075458.8A 2018-09-14 2018-09-14 Method and apparatus for generating information Pending CN109241296A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811075458.8A CN109241296A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811075458.8A CN109241296A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information

Publications (1)

Publication Number Publication Date
CN109241296A true CN109241296A (en) 2019-01-18

Family

ID=65058115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811075458.8A Pending CN109241296A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information

Country Status (1)

Country Link
CN (1) CN109241296A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489742A (en) * 2019-07-15 2019-11-22 北京三快在线科技有限公司 A kind of segmenting method, device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046809A (en) * 2006-03-28 2007-10-03 吴风勇 New word identification method based on association rule model
CN102521263A (en) * 2011-11-21 2012-06-27 北京百度网讯科技有限公司 Method and device for obtaining subject vocabulary entry
CN103106227A (en) * 2012-08-03 2013-05-15 人民搜索网络股份公司 System and method of looking up new word based on webpage text
US9251428B2 (en) * 2009-07-18 2016-02-02 Abbyy Development Llc Entering information through an OCR-enabled viewfinder
CN105528421A (en) * 2015-12-07 2016-04-27 中国人民大学 Search dimension excavation method of query terms in mass data
CN106033462A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Neologism discovering method and system
CN106339481A (en) * 2016-08-30 2017-01-18 电子科技大学 Chinese compound new-word discovery method based on maximum confidence coefficient
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046809A (en) * 2006-03-28 2007-10-03 吴风勇 New word identification method based on association rule model
US9251428B2 (en) * 2009-07-18 2016-02-02 Abbyy Development Llc Entering information through an OCR-enabled viewfinder
CN102521263A (en) * 2011-11-21 2012-06-27 北京百度网讯科技有限公司 Method and device for obtaining subject vocabulary entry
CN103106227A (en) * 2012-08-03 2013-05-15 人民搜索网络股份公司 System and method of looking up new word based on webpage text
CN106033462A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Neologism discovering method and system
CN105528421A (en) * 2015-12-07 2016-04-27 中国人民大学 Search dimension excavation method of query terms in mass data
CN106339481A (en) * 2016-08-30 2017-01-18 电子科技大学 Chinese compound new-word discovery method based on maximum confidence coefficient
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489742A (en) * 2019-07-15 2019-11-22 北京三快在线科技有限公司 A kind of segmenting method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10824874B2 (en) Method and apparatus for processing video
CN107491534B (en) Information processing method and device
CN109190124B (en) Method and apparatus for participle
CN108305626A (en) The sound control method and device of application program
CN107908789A (en) Method and apparatus for generating information
US11758088B2 (en) Method and apparatus for aligning paragraph and video
CN108228906B (en) Method and apparatus for generating information
CN108932220A (en) article generation method and device
CN108121800A (en) Information generating method and device based on artificial intelligence
CN109522486A (en) Method and apparatus for match information
CN107731229A (en) Method and apparatus for identifying voice
CN109299477A (en) Method and apparatus for generating text header
CN109635094A (en) Method and apparatus for generating answer
CN108933730A (en) Information-pushing method and device
CN109543058A (en) For the method for detection image, electronic equipment and computer-readable medium
CN109582825B (en) Method and apparatus for generating information
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
CN109933217A (en) Method and apparatus for pushing sentence
CN108280200A (en) Method and apparatus for pushed information
CN108121699A (en) For the method and apparatus of output information
CN110084658A (en) The matched method and apparatus of article
CN109920431A (en) Method and apparatus for output information
CN109325178A (en) Method and apparatus for handling information
CN110019948A (en) Method and apparatus for output information
CN110245334A (en) Method and apparatus for output information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190118