CN106919711B - Method and device for labeling information based on artificial intelligence - Google Patents

Method and device for labeling information based on artificial intelligence Download PDF

Info

Publication number
CN106919711B
CN106919711B CN201710144895.XA CN201710144895A CN106919711B CN 106919711 B CN106919711 B CN 106919711B CN 201710144895 A CN201710144895 A CN 201710144895A CN 106919711 B CN106919711 B CN 106919711B
Authority
CN
China
Prior art keywords
information
tag
keyword set
keyword
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710144895.XA
Other languages
Chinese (zh)
Other versions
CN106919711A (en
Inventor
曹宇慧
闭玮
刘志慧
何径舟
周古月
沈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710144895.XA priority Critical patent/CN106919711B/en
Publication of CN106919711A publication Critical patent/CN106919711A/en
Application granted granted Critical
Publication of CN106919711B publication Critical patent/CN106919711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management

Abstract

The application discloses an information labeling method and device based on artificial intelligence. One embodiment of the method comprises: acquiring information to be marked; extracting keywords of information to be labeled to generate a first keyword set; matching the first keyword set with a preset keyword set to obtain first keywords which are successfully matched in the first keyword set so as to generate a second keyword set; determining the matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag keywords in the tag keyword set are at least partially identical to the preset keywords in the preset keyword set; selecting label information from the label information set as target label information based on the determined matching degree; and marking the label content in the target label information on the information to be marked. This embodiment saves the user the time it takes to browse the information.

Description

Method and device for labeling information based on artificial intelligence
Technical Field
The application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to an information labeling method and device based on artificial intelligence.
Background
Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.
News is a name generally used for transmitting information through media such as newspapers, radio stations, broadcasting, television stations, internet and the like, and is a genre for recording society, transmitting information and reflecting the era. Generally, news may include sections such as headlines, subjects, backgrounds, and colloquials. The title may generally include a guide title, a main title, and a sub-title, among others. The lead language may briefly reveal the core content of the news. The subject can represent the subject matter of the news with sufficient facts, which is a further extension and explanation of the guide. The background may be the social environment and the natural environment of the news occurrence.
However, the layout of the guide, the body, the background, the final and other parts of the existing news is various, so that a user needs to spend a lot of time to browse the information required by the user from the news.
Disclosure of Invention
The present application is directed to an improved method and apparatus for tagging information based on artificial intelligence, which solves the technical problems mentioned in the background section above.
In a first aspect, an embodiment of the present application provides a method for labeling information based on artificial intelligence, where the method includes: acquiring information to be marked; extracting keywords of information to be labeled to generate a first keyword set; matching the first keyword set with a preset keyword set to obtain first keywords which are successfully matched in the first keyword set so as to generate a second keyword set; determining the matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag information comprises tag content and the tag keyword set, and the tag keywords in the tag keyword set are at least partially the same as the preset keywords in the preset keyword set; selecting label information from the label information set as target label information based on the determined matching degree; and marking the label content in the target label information on the information to be marked.
In some embodiments, the tag information further includes a weight for each tag keyword in the set of tag keywords; and determining the matching degree between the second keyword set and the tag keyword set in each tag information in the preset tag information set, including: for the label keyword set in each label information in the label information set, matching the label keyword set with the second keyword set, obtaining the successfully matched label keywords to generate a third keyword set, and obtaining the weight of each third keyword in the third keyword set; and adding the weight of each third keyword in the third keyword set to obtain the matching degree between the second keyword set and the label keyword set.
In some embodiments, selecting the tag information from the tag information set as the target tag information based on the determined matching degree includes: and selecting the label information with the highest matching degree from the label information set as target label information.
In some embodiments, selecting the tag information from the tag information set as the target tag information based on the determined matching degree includes: and selecting the label information with the matching degree larger than a first preset threshold value from the label information set as target label information.
In some embodiments, extracting the keywords of the information to be labeled to generate a first keyword set includes: and performing word segmentation on the information to be labeled, and acquiring keywords of the information to be labeled to generate a first keyword set.
In a second aspect, an embodiment of the present application provides an apparatus for labeling information based on artificial intelligence, where the apparatus includes: the acquisition unit is configured to acquire information to be marked; the extraction unit is used for extracting keywords of the information to be labeled so as to generate a first keyword set; the matching unit is configured to match the first keyword set with a preset keyword set, and obtain successfully matched first keywords in the first keyword set to generate a second keyword set; the determining unit is configured to determine a matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag information includes tag content and the tag keyword set, and the tag keywords in the tag keyword set are at least partially the same as the preset keywords in the preset keyword set; the selecting unit is configured to select the label information from the label information set as target label information based on the determined matching degree; and the labeling unit is configured to label the label content in the target label information on the information to be labeled.
In some embodiments, the tag information further includes a weight for each tag keyword in the set of tag keywords; and the determination unit includes: the matching subunit is configured to match the tag keyword set with the second keyword set for each tag information in the tag information set, obtain successfully matched tag keywords to generate a third keyword set, and obtain a weight of each third keyword in the third keyword set; and the calculating subunit is configured to add the weight of each third keyword in the third keyword set to obtain the matching degree between the second keyword set and the tag keyword set.
In some embodiments, the selecting unit is further configured to: and selecting the label information with the highest matching degree from the label information set as target label information.
In some embodiments, the selecting unit is further configured to: and selecting the label information with the matching degree larger than a first preset threshold value from the label information set as target label information.
In some embodiments, the extraction unit is further configured to: and performing word segmentation on the information to be labeled, and acquiring keywords of the information to be labeled to generate a first keyword set.
In a third aspect, an embodiment of the present application provides a server, where the server includes: one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out a method as described in any one of the implementations of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the method and the device for marking information based on artificial intelligence, a first keyword set of information to be marked is extracted; then matching the first keyword set with a preset keyword set to obtain a second keyword set; then determining the matching degree between the second keyword set and the label keyword set in each label information in the label information set; and finally, based on the determined matching degree, selecting target label information from the label information set, and labeling the label content in the target label information on the information to be labeled. The information is automatically labeled with the label content after being analyzed through artificial intelligence, so that a user can quickly judge whether the information is the information required by the user through the label content labeled on the information, and the time spent by the user for browsing the information is saved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram to which embodiments of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for artificial intelligence based tagging information according to the present application;
FIG. 3 is a diagram illustrating an application scenario of a method for tagging information based on artificial intelligence according to an embodiment of the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for artificial intelligence based tagging information according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for artificial intelligence based annotation information according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the artificial intelligence based tagging information method or apparatus of the present application may be applied.
As shown in fig. 1, system architecture 100 may include terminal devices 101, 102, 103, network 104, server 105, and database server 106. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103, server 105 and database server 106. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as news-like applications, web browser applications, search-like applications, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting information browsing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like.
The database server 106 may be a database server that stores information to be annotated.
The server 105 may be a server that provides various services, such as a background information server that provides support for information displayed on the terminal devices 101, 102, 103. The background information server may analyze and otherwise process information displayed on the terminal apparatuses 101, 102, and 103 and feed back a processing result (for example, information indicating tag content) to the terminal apparatuses 101, 102, and 103.
It should be noted that the method for tagging information based on artificial intelligence provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the apparatus for tagging information based on artificial intelligence is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, servers, and database servers in fig. 1 are merely illustrative. There may be any number of terminal devices, networks, servers, and database servers, as desired for implementation. In the case where the information to be annotated is stored in the server 105, the database server 106 may not be provided in the system architecture 100.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for artificial intelligence based annotation of information in accordance with the subject application is illustrated. The information labeling method based on artificial intelligence comprises the following steps:
step 201, obtaining information to be marked.
In this embodiment, the electronic device (e.g., the server 105 shown in fig. 1) on which the artificial intelligence based annotation information method operates can obtain the information to be annotated from a database server (e.g., the database server 106 shown in fig. 1) which is local or in communication connection with the electronic device. The information to be marked can be character information on the internet. As an example, the information to be annotated may be a piece of news, may also be one natural segment in a piece of news, and may also be several natural segments in a piece of news.
Step 202, extracting keywords of the information to be labeled to generate a first keyword set.
In this embodiment, based on the information to be annotated acquired in step 201, the electronic device may perform content analysis on the information to be annotated, so as to extract at least one keyword of the information to be annotated to generate a first keyword set.
In some optional implementation manners of this embodiment, the electronic device may perform word segmentation on the information to be labeled, and obtain a keyword of the information to be labeled to generate a first keyword set. As an example, the electronic device may perform processing such as a full segmentation method on the content of the key information to segment the content into words; then, performing importance calculation on the obtained words (for example, adopting a TF-IDF (Term Frequency-Inverse Document Frequency method)); and finally, selecting keywords based on the result of the importance calculation to generate a first keyword set.
Step 203, matching the first keyword set with a preset keyword set, and acquiring successfully matched first keywords in the first keyword set to generate a second keyword set.
In this embodiment, for each first keyword in the first keyword set extracted in step 202, the electronic device may compare the first keyword with each preset keyword in a preset keyword set one by one. If a preset keyword is the same as the first keyword, matching is successful, and the successfully matched first keyword is used as a second keyword. After all the comparison is completed, if a plurality of successfully matched first keywords exist, at least one first keyword can be selected from the successfully matched first keywords to generate a second keyword set. Generally, the electronic device can select all the successfully matched first keywords to generate a second keyword set.
In general, the preset keyword set may include various types of word sets. For example, the preset keyword set may include a word set representing time, a word set representing an event state, a word set representing a fact, a word set representing a conjecture, a word set representing comments and attitudes, a word set representing a result, and the like. Wherein the set of words representing time may include a set of words representing a past time, a set of words representing a present time, and a set of words representing a future time.
It should be noted that the preset keyword set may be stored in various forms. For example, the preset keyword set may be stored in a data table or an XML (Extensible Markup Language) file. The specific storage form and storage location of the preset keyword set are not limited in this embodiment.
As an example, in the case where the preset keyword set includes a plurality of types of word sets, the preset keyword set may be stored in the form as shown in table 1:
Figure BDA0001243815590000071
TABLE 1
Step 204, determining a matching degree between the second keyword set and the tag keyword set in each tag information in the preset tag information set.
In this embodiment, based on the second keyword set generated in step 203, the electronic device may determine a matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set. As an example, for a tag keyword set in each tag information in the tag information set, the electronic device may match each second keyword in the second keyword set with each tag keyword in the tag keyword set one by one, and use a ratio of the number of successfully matched second keywords to the number of all second keywords in the second keyword set as a matching degree between the second keyword set and the tag keyword set.
In this embodiment, the tag information may include tag content and a tag keyword set. The tag content of the tag information may be used to label the portion of the information that belongs to the entire information. As an example, for a news story, the tag content of the tag information may include, but is not limited to: events, backgrounds, analyses, details, cores, impacts, causes, predictions, comments, and the like. The label keywords in the label keyword set are at least partially identical to the preset keywords in the preset keyword set. Typically, the set of tagged keywords is a subset of a set of preset keywords. As an example, if the preset keyword set includes a plurality of types of word sets, the tag keyword set may include a part of the types of word sets in the preset keyword set.
It should be noted that the tag information set may be stored in various forms. For example, the set of tag information may be stored in a data table or an XML file. The embodiment does not limit the specific storage form and storage location of the tag information set.
As an example, the tag content in the tag information may be stored in the form as shown in table 2:
Figure BDA0001243815590000091
TABLE 2
The tag keyword set in the tag information can be searched from table 1 according to the type name corresponding to the tag content in table 2, and the tag keyword set in the tag information does not need to be stored separately.
And step 205, selecting the label information from the label information set as the target label information based on the determined matching degree.
In this embodiment, based on the matching degree determined in step 204, the electronic device may select tag information from the tag information set as target tag information. The target tag information selected in this embodiment may be one tag information or a plurality of tag information.
In some optional implementation manners of this embodiment, the electronic device may select, as the target tag information, the tag information with the highest matching degree from the tag information set. In this implementation, the target tag information is typically one tag information.
In some optional implementation manners of this embodiment, the electronic device may select, from the tag information set, tag information with a matching degree greater than a first preset threshold as target tag information. In this implementation, the target tag information is typically a plurality of tag information. Wherein the first preset threshold may be set by default.
And step 206, labeling the label content in the target label information on the information to be labeled.
In this embodiment, based on the target tag information selected in step 205, the electronic device may label the tag content in the target tag information on the information to be labeled. As an example, the label content in the target label information may be annotated on the information to be annotated in the form of annotations.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for labeling information based on artificial intelligence according to the embodiment of the present application. In the application scenario of FIG. 3, first the server may obtain a news item entitled "XX" bought "YY" IT services Business "in $ 30.55 billion, wherein the news item includes 3 natural segments; then the server can respectively extract the first keywords of the 3 natural segments to generate 3 first keyword sets; then the server respectively matches the 3 first keyword sets with a preset keyword set to obtain 3 second keyword sets; then the server can determine the matching degree between the 3 second keyword sets and the tag keyword set in each tag information in the preset tag information set; and finally, the server selects 3 pieces of label information from the label information sets respectively based on the determined matching degrees, and labels the label contents of the selected label information on the 3 natural segments. The label content labeled by the first natural segment is the core, the label content labeled by the second natural segment is the reason, and the label content labeled by the third natural segment is the background. When the user browses the news on the client, it is displayed on the client as shown at 301.
The method for marking information based on artificial intelligence provided by the embodiment of the application extracts a first keyword set of the information to be marked; then matching the first keyword set with a preset keyword set to obtain a second keyword set; then determining the matching degree between the second keyword set and the label keyword set in each label information in the label information set; and finally, based on the determined matching degree, selecting target label information from the label information set, and labeling the label content in the target label information on the information to be labeled. The information is automatically labeled with the label content after being analyzed through artificial intelligence, so that a user can quickly judge whether the information is the information required by the user through the label content labeled on the information, and the time spent by the user for browsing the information is saved.
With further reference to FIG. 4, a flow 400 of yet another embodiment of an artificial intelligence based tagging information method in accordance with the present application is illustrated. The process 400 includes the following steps:
step 401, obtaining information to be marked.
In this embodiment, the electronic device (e.g., the server 105 shown in fig. 1) on which the artificial intelligence based annotation information method operates can obtain the information to be annotated from a database server (e.g., the database server 106 shown in fig. 1) which is local or in communication connection with the electronic device. The information to be marked may be a piece of news, a piece of natural segment in the piece of news, or a plurality of natural segments in the piece of news.
Step 402, extracting keywords of information to be labeled to generate a first keyword set.
In this embodiment, based on the information to be annotated acquired in step 401, the electronic device may perform content analysis on the information to be annotated, so as to extract at least one keyword of the information to be annotated to generate a first keyword set.
Step 403, matching the first keyword set with a preset keyword set, and obtaining successfully matched first keywords in the first keyword set to generate a second keyword set.
In this embodiment, for each first keyword in the first keyword set extracted in step 402, the electronic device may compare the first keyword with each preset keyword in a preset keyword set one by one. If a preset keyword is the same as the first keyword, matching is successful, and the successfully matched first keyword is used as a second keyword. After all the comparison is completed, if a plurality of successfully matched first keywords exist, at least one first keyword can be selected from the successfully matched first keywords to generate a second keyword set. Generally, the electronic device can select all the successfully matched first keywords to generate a second keyword set.
Step 404, for the tag keyword set in each tag information in the tag information set, matching the tag keyword set with the second keyword set, obtaining the successfully matched tag keywords to generate a third keyword set, and obtaining the weight of each third keyword in the third keyword set.
In this embodiment, for each tag keyword set in each tag information in the tag information set, the electronic device may compare each tag keyword in the tag keyword set with each second keyword in the second keyword set generated in step 403 one by one. If one label keyword is the same as one second keyword, matching is successful, and the successfully matched label keyword is used as a third keyword. And after all the comparison is finished, at least one third key word is selected from the label key word set to generate a third key word set. And meanwhile, acquiring the weight of each third keyword in the third keyword set from the label information. Wherein the tag information may further include a weight of each tag keyword in the set of tag keywords.
In the present embodiment, in the case where the tag keyword set includes a plurality of types of word sets, different types of word sets may be given different weights according to their degrees of importance. As an example, the weight of a word in the word set representing time in the tag keyword set may be set to 0.2, and the weight of a word in the word set other than that may be set to 0.1.
Step 405, adding the weight of each third keyword in the third keyword set to obtain the matching degree between the second keyword set and the tag keyword set.
In this embodiment, based on the weight of each third keyword in the third keyword set obtained in step 404, the electronic device may add the weights of each third keyword in the third keyword set, and use the sum as a matching degree between the second keyword set and the tag keyword set.
And step 406, selecting the label information from the label information set as the target label information based on the determined matching degree.
In this embodiment, based on the matching degree determined in step 405, the electronic device may select tag information from the tag information set as target tag information. The target tag information selected in this embodiment may be one tag information or a plurality of tag information.
Step 407, label content in the target label information is labeled on the information to be labeled.
In this embodiment, based on the target tag information selected in step 406, the electronic device may label the tag content in the target tag information on the information to be labeled. As an example, the label content in the target label information may be annotated on the information to be annotated in the form of annotations.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for labeling information based on artificial intelligence in this embodiment highlights the step of determining the matching degree. Therefore, different types of word sets in the tag keyword set in the scheme described in this embodiment can be given different weights according to the importance degree thereof, so that the obtained matching degree is more accurate.
With further reference to fig. 5, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for tagging information based on artificial intelligence, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for labeling information based on artificial intelligence shown in this embodiment includes: the device comprises an acquisition unit 501, an extraction unit 502, a matching unit 503, a determination unit 504, a selection unit 505 and a labeling unit 506. The acquiring unit 501 is configured to acquire information to be marked; an extracting unit 502 configured to extract keywords of information to be labeled to generate a first keyword set; the matching unit 503 is configured to match the first keyword set with a preset keyword set, and obtain a first keyword that is successfully matched in the first keyword set to generate a second keyword set; a determining unit 504, configured to determine a matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, where the tag information includes tag content and a tag keyword set, and the tag keywords in the tag keyword set are at least partially the same as the preset keywords in the preset keyword set; a selecting unit 505 configured to select tag information from the tag information set as target tag information based on the determined matching degree; and the labeling unit 506 is configured to label the label content in the target label information on the information to be labeled.
In the present embodiment, in the apparatus 500 for labeling information based on artificial intelligence: the specific processing of the obtaining unit 501, the extracting unit 502, the matching unit 503, the determining unit 504, the selecting unit 505, and the labeling unit 506 and the technical effects thereof can refer to the related descriptions of step 201, step 202, step 203, step 204, step 205, and step 206 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementations of this embodiment, the tag information may further include a weight of each tag keyword in the tag keyword set; and the determining unit 504 may include: a matching subunit (not shown in the figure), configured to, for each tag keyword set in the tag information set, match the tag keyword set with the second keyword set, obtain successfully matched tag keywords to generate a third keyword set, and obtain a weight of each third keyword in the third keyword set; and a calculating subunit (not shown in the figure), configured to add the weight of each third keyword in the third keyword set to obtain a matching degree between the second keyword set and the tag keyword set.
In some optional implementations of this embodiment, the selecting unit 505 may be further configured to: and selecting the label information with the highest matching degree from the label information set as target label information.
In some optional implementations of this embodiment, the selecting unit 505 may be further configured to: and selecting the label information with the matching degree larger than a first preset threshold value from the label information set as target label information.
In some optional implementations of this embodiment, the extracting unit 502 may be further configured to: and performing word segmentation on the information to be labeled, and acquiring keywords of the information to be labeled to generate a first keyword set.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a server according to embodiments of the present application. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises an acquisition unit, an extraction unit, a matching unit, a determination unit, a selection unit and a labeling unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the acquisition unit may also be described as a "unit that acquires information to be labeled".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the server described in the above embodiments; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring information to be marked; extracting keywords of information to be labeled to generate a first keyword set; matching the first keyword set with a preset keyword set to obtain first keywords which are successfully matched in the first keyword set so as to generate a second keyword set; determining the matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag information comprises tag content and the tag keyword set, and the tag keywords in the tag keyword set are at least partially the same as the preset keywords in the preset keyword set; selecting label information from the label information set as target label information based on the determined matching degree; and marking the label content in the target label information on the information to be marked.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (8)

1. A method for labeling information based on artificial intelligence, the method comprising:
acquiring information to be marked;
extracting keywords of the information to be labeled to generate a first keyword set;
matching the first keyword set with a preset keyword set, and acquiring successfully matched first keywords in the first keyword set to generate a second keyword set;
determining a matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag information comprises tag content and a tag keyword set, and the tag keywords in the tag keyword set are at least partially identical to the preset keywords in the preset keyword set;
based on the determined matching degree, selecting label information from the label information set as target label information;
labeling the label content in the target label information on the information to be labeled;
wherein the tag information further comprises a weight of each tag keyword in the set of tag keywords; and
the determining the matching degree between the second keyword set and the tag keyword set in each tag information in the preset tag information set includes:
for the tag keyword set in each tag information in the tag information set, matching the tag keyword set with the second keyword set, obtaining successfully matched tag keywords to generate a third keyword set, and obtaining the weight of each third keyword in the third keyword set;
and adding the weight of each third keyword in the third keyword set to obtain the matching degree between the second keyword set and the label keyword set.
2. The method of claim 1, wherein selecting label information from the set of label information as target label information based on the determined matching degree comprises:
and selecting the label information with the highest matching degree from the label information set as target label information.
3. The method of claim 1, wherein selecting label information from the set of label information as target label information based on the determined matching degree comprises:
and selecting the label information with the matching degree larger than a first preset threshold value from the label information set as target label information.
4. The method according to one of claims 1 to 3, wherein the extracting the keywords of the information to be labeled to generate a first keyword set comprises:
and segmenting the information to be labeled to obtain the keywords of the information to be labeled so as to generate a first keyword set.
5. An apparatus for labeling information based on artificial intelligence, the apparatus comprising:
the acquisition unit is configured to acquire information to be marked;
the extraction unit is used for extracting the keywords of the information to be labeled so as to generate a first keyword set;
the matching unit is configured to match the first keyword set with a preset keyword set, and obtain successfully matched first keywords in the first keyword set to generate a second keyword set;
the determining unit is configured to determine a matching degree between the second keyword set and a tag keyword set in each tag information in a preset tag information set, wherein the tag information includes tag content and a tag keyword set, and the tag keywords in the tag keyword set are at least partially the same as the preset keywords in the preset keyword set;
the selecting unit is configured to select the label information from the label information set as target label information based on the determined matching degree;
the labeling unit is configured to label the label content in the target label information on the information to be labeled;
wherein the tag information further comprises a weight of each tag keyword in the set of tag keywords; and
the determination unit includes:
the matching subunit is configured to match the tag keyword set with the second keyword set for each tag information in the tag information set, obtain successfully matched tag keywords to generate a third keyword set, and obtain a weight of each third keyword in the third keyword set;
and the calculating subunit is configured to add the weight of each third keyword in the third keyword set to obtain the matching degree between the second keyword set and the tag keyword set.
6. The apparatus of claim 5, wherein the extraction unit is further configured to:
and segmenting the information to be labeled to obtain the keywords of the information to be labeled so as to generate a first keyword set.
7. A server, characterized in that the server comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN201710144895.XA 2017-03-13 2017-03-13 Method and device for labeling information based on artificial intelligence Active CN106919711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710144895.XA CN106919711B (en) 2017-03-13 2017-03-13 Method and device for labeling information based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710144895.XA CN106919711B (en) 2017-03-13 2017-03-13 Method and device for labeling information based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN106919711A CN106919711A (en) 2017-07-04
CN106919711B true CN106919711B (en) 2020-10-02

Family

ID=59461669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710144895.XA Active CN106919711B (en) 2017-03-13 2017-03-13 Method and device for labeling information based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN106919711B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908739A (en) * 2017-11-15 2018-04-13 湖南上容信息技术有限公司 Dynamic syntax analytic method and its resolution system
CN109145261B (en) * 2018-09-04 2022-12-06 北京奇艺世纪科技有限公司 Method and device for generating label
CN109325213B (en) * 2018-09-30 2023-11-28 北京字节跳动网络技术有限公司 Method and device for labeling data
CN109388753A (en) * 2018-10-31 2019-02-26 北京字节跳动网络技术有限公司 Method and apparatus for handling information
CN109815481B (en) * 2018-12-17 2023-05-26 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for extracting event from text
CN110321544B (en) * 2019-07-08 2023-07-25 北京百度网讯科技有限公司 Method and device for generating information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106199A (en) * 2011-11-09 2013-05-15 中国移动通信集团四川有限公司 Text retrieval method and test retrieval device
CN105824930A (en) * 2016-03-17 2016-08-03 深圳市金立通信设备有限公司 Voice message processing method and terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082486A1 (en) * 2006-09-29 2008-04-03 Yahoo! Inc. Platform for user discovery experience
CN102982076B (en) * 2012-10-30 2015-08-19 新华通讯社 Based on the various dimensions content mask method in semantic label storehouse
CN104317891B (en) * 2014-10-23 2017-11-28 华为软件技术有限公司 A kind of method and device that label is marked to the page
CN106033445B (en) * 2015-03-16 2019-10-25 北京国双科技有限公司 The method and apparatus for obtaining article degree of association data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106199A (en) * 2011-11-09 2013-05-15 中国移动通信集团四川有限公司 Text retrieval method and test retrieval device
CN105824930A (en) * 2016-03-17 2016-08-03 深圳市金立通信设备有限公司 Voice message processing method and terminal

Also Published As

Publication number Publication date
CN106919711A (en) 2017-07-04

Similar Documents

Publication Publication Date Title
CN106919711B (en) Method and device for labeling information based on artificial intelligence
CN107491547B (en) Search method and device based on artificial intelligence
CN107729319B (en) Method and apparatus for outputting information
CN108804450B (en) Information pushing method and device
CN107346336B (en) Information processing method and device based on artificial intelligence
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN107241260B (en) News pushing method and device based on artificial intelligence
CN107577807B (en) Method and device for pushing information
CN109325213B (en) Method and device for labeling data
CN109145280A (en) The method and apparatus of information push
CN107526718B (en) Method and device for generating text
CN108121699B (en) Method and apparatus for outputting information
CN108280200B (en) Method and device for pushing information
US20150227276A1 (en) Method and system for providing an interactive user guide on a webpage
CN109543058A (en) For the method for detection image, electronic equipment and computer-readable medium
CN106886594B (en) Method and device for displaying information
CN111104479A (en) Data labeling method and device
CN110737824B (en) Content query method and device
CN108038172B (en) Search method and device based on artificial intelligence
CN111026849B (en) Data processing method and device
CN111782850A (en) Object searching method and device based on hand drawing
CN108664511B (en) Method and device for acquiring webpage information
CN113743973A (en) Method and device for analyzing market hotspot trend
CN110110199B (en) Information output method and device
CN108664535B (en) Information output method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant