需要金幣:![]() ![]() |
資料包括:完整論文 | ![]() |
![]() |
轉(zhuǎn)換比率:金額 X 10=金幣數(shù)量, 例100元=1000金幣 | 論文字?jǐn)?shù):9365 | ![]() | |
折扣與優(yōu)惠:團(tuán)購最低可5折優(yōu)惠 - 了解詳情 | 論文格式:Word格式(*.doc) | ![]() |
摘要:隨著互聯(lián)網(wǎng)逐漸成為日常生活中必不可少的一種信息傳播的媒介,人們徹底顛覆了以前獲取信息的模式,由被動接收信息變?yōu)橹鲃訉ふ腋信d趣的信息。然而,在面對互聯(lián)網(wǎng)上的海量信息時(shí),如何快速搜索出有價(jià)值或者我們所感興趣的信息顯得極為重要。 目前,對信息內(nèi)容進(jìn)行關(guān)鍵詞索引是非常有效的信息檢索的方式之一,而這種技術(shù)同時(shí)也被廣泛的運(yùn)用于搜索引擎等互聯(lián)網(wǎng)應(yīng)用中。相對于以往查找信息需要對全文進(jìn)行檢索的方式來說,只檢索關(guān)鍵詞的查找速度將大幅提高,并且對系統(tǒng)性能要求大幅降低,所以利用關(guān)鍵詞檢索是一種使用更低成本帶來更高效益的方法。然而,由于信息本身并沒有顯著地標(biāo)識出關(guān)鍵詞,而為它們手工標(biāo)引出關(guān)鍵詞的成本很高。所以為了提高關(guān)鍵詞索引的速度和質(zhì)量,如何利用機(jī)器來自動完成對信息的關(guān)鍵詞索引就成了一項(xiàng)十分有意義的課題,而自動化的關(guān)鍵詞索引也是未來互聯(lián)網(wǎng)對信息處理的一個研究方向。 本文主要介紹了關(guān)鍵詞索引的研究背景和國內(nèi)外的研究現(xiàn)狀,以及針對中英文單詞之間的差異,對中文進(jìn)行分詞,提取中文關(guān)鍵詞的特征。文中設(shè)計(jì)出一種將統(tǒng)計(jì)信息、語義分析和機(jī)器學(xué)習(xí)方法有機(jī)相結(jié)合的一種關(guān)鍵詞索引算法,并在實(shí)驗(yàn)中能取得較好的實(shí)驗(yàn)效果。 關(guān)鍵詞:關(guān)鍵詞,索引,抽取,關(guān)鍵詞索引,信息檢索
Abstract:With Internet becoming indispensable in daily life as a medium of information dissemination, people have completely changed the passive model of receiving information. However, faced with massive information from Internet, it’s extremely important to find out how to search information people concerned or valuable. At present, the keywords index on the information content is one of effective ways to retrieve information, and this technology also has been widely used in search engine and other Internet applications. Compared with the previous way using full-text search to retrieve information, searching keywords only can raise the speed rapidly and reduce the system performance requirements. So using keywords search is a way of lower cost but higher benefit. However, the information itself does not identify significant words, and the cost of manual keywords indexing for information is very high. Therefore, in order to improve the speed and quality of keywords indexing, knowing how to use machines to realize the keywords index of information automatically has become a very significant issue, while automated keywords indexing is the research direction of information processing by Internet in the future. This paper introduces the research background of the keywords indexing and current situation at home and abroad, as well as Chinese words segmentation and further picking up features of Chinese words according to the differences between Chinese and English words. In this paper, a keywords indexing algorithm, combined with information statistics, semantic analysis and machine learning methods, is designed and achieved desired results in the experiments. Keywords:Keyword, Index, Extraction, Keyword Indexing, Information Retrieva
如何讓機(jī)器自動對文本信息進(jìn)行關(guān)鍵詞抽取就是本課題的主要研究的目的。同時(shí),考慮到中文詞語和英文單詞之間的差異,國外的一些方法并不適用在中文文本的索引,所以特別針對中文文本的關(guān)鍵詞抽取是本課題的一個重要研究方向。并且,往往根據(jù)文字內(nèi)容的層次關(guān)系,關(guān)鍵詞并不僅限于從文本自身中抽取,任何可以概括整篇文章內(nèi)容的詞語即便沒有出現(xiàn)在文本當(dāng)中也應(yīng)可以作為整篇文章的關(guān)鍵詞。這就需要機(jī)器處理文本時(shí)也具有一定的學(xué)習(xí)和聯(lián)想能力。
|