需要金幣:![]() ![]() |
資料包括:完整論文 | ![]() |
![]() |
轉(zhuǎn)換比率:金額 X 10=金幣數(shù)量, 例100元=1000金幣 | 論文字?jǐn)?shù):13181 | ![]() | |
折扣與優(yōu)惠:團(tuán)購(gòu)最低可5折優(yōu)惠 - 了解詳情 | 論文格式:Word格式(*.doc) | ![]() |
摘要:隨著科技的發(fā)展,人們?cè)絹?lái)越需要在大量的數(shù)據(jù)中找到對(duì)自己有用的信息,空間關(guān)系自動(dòng)識(shí)別就是這一背景下產(chǎn)生的重要課題,它也是自然語(yǔ)言處理的一項(xiàng)重要任務(wù)??臻g關(guān)系的自動(dòng)識(shí)別就是利用關(guān)系抽取技術(shù),根據(jù)已標(biāo)注實(shí)例集來(lái)預(yù)測(cè)未知實(shí)例所屬類(lèi)別。不同于傳統(tǒng)單標(biāo)記分類(lèi)技術(shù)的是,地理實(shí)體對(duì)可能同時(shí)屬于多個(gè)類(lèi)別,是一個(gè)多標(biāo)記分類(lèi)問(wèn)題。 通過(guò)對(duì)多標(biāo)記分類(lèi)算法進(jìn)行研究,本文選擇基于k近鄰的多標(biāo)記分類(lèi)算法(ML-KNN)進(jìn)行空間關(guān)系抽取。ML-KNN算法是以KNN算法進(jìn)行擴(kuò)展的多標(biāo)記分類(lèi)算法。首先要獲得每個(gè)待分類(lèi)實(shí)例在訓(xùn)練集中的K個(gè)最近鄰,再根據(jù)近鄰實(shí)例所屬類(lèi)別得到最大后驗(yàn)概率,判斷待分類(lèi)實(shí)例是否具有每個(gè)可能的標(biāo)記。并且,在空間實(shí)例相似度的計(jì)算上,本文選擇基于擴(kuò)展子序列核的方法。 本文使用《百科全書(shū)》上收集的188篇中文文檔作為實(shí)驗(yàn)數(shù)據(jù),將這188篇文檔進(jìn)行劃分,隨機(jī)選取其中3/4為訓(xùn)練文檔,剩余1/4為測(cè)試文檔。使用ML-KNN算法實(shí)現(xiàn)空間關(guān)系的抽取并使用多標(biāo)記分類(lèi)算法的評(píng)價(jià)指標(biāo)對(duì)實(shí)驗(yàn)結(jié)果進(jìn)行分析。 【關(guān)鍵字】機(jī)器學(xué)習(xí);空間關(guān)系;關(guān)系抽?。欢鄻?biāo)記分類(lèi);K最近鄰;擴(kuò)展子序列核
Abstract:With the development of technology, people have an increasing need to get useful information from large amounts of data. Automatic recognition of spatial relations is an important subject arising under this background and is also an important task of natural language processing. Relation extraction is used in automatic recognition of spatial relations to classify unseen instances by the marked instances. Different from traditional classification, Geographic relations may belong to multiple categories at the same time. Automatic identification of spatial relations is a multi-lable classification problem. After studying the multi-lable algorithms, we choose the multi-lable K nearest neighbors (ML-KNN) algorithm for spatial relation extraction.ML-KNN is a multi-lable classification algorithm which is derived from the traditional K nearest neighbors (KNN) algorithm. Firstly, for each unseen instance, its K nearest neighbors in the training set are identified. After that, based on the number of neighboring instances belong to each possible class, maximum a posteriori principle is used to forecast the lable set of the unseen instance. To calculate the instances’ similarity, we use the subsequence kernel method. We do the experiment with 188 Chinese documents which are collected from Encyclopedia as our experiment data. By dividing these 188 documents, we randomly pick 3/4 of them as training documents and pick the remaining 1/4 as test documents. We complete the extraction of spatial relations by using ML-KNN method and analyse the experimental results. keywords: Machine Learning; Spatial Relations; Relation Extraction; Muti-Lable Classification; K Nearest Neighbors; Subsequence Kernel
|