Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances

Aurawan Imsombut; Jesada Kajornrit

doi:10.6000/1929-7092.2017.06.39

Authors

Aurawan Imsombut College of Creative Design and Entertainment Technology, Dhurakij Pundit University, Bangkok, Thailand
Jesada Kajornrit College of Innovative Technology and Engineering, Dhurakij Pundit University, Bangkok, Thailand

DOI:

https://doi.org/10.6000/1929-7092.2017.06.39

Keywords:

Ontology Enrichment, Statistical Technique, Classification, Conditional Random Fields (CRFs), Feature-weighted k-Nearest Neighbor

Abstract

Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification.

References

Carlson, A., J. Betteridge, R. Wang, Jr. E. Hruschka and T. Mitchell. 2010. “Coupled Semi-Supervised Learning for Information Extraction” In Proceedings of the third ACM International Conference on Web Search and Data Mining (WSDM ’10).
https://doi.org/10.1145/1718487.1718501
Chinchor, N. 1998. “MUC-7 named entity task definition dry run version, version 3.5” Proceedings of the Seventh Message Understanding Conference (MUC-7) (to appear). Fairfax,Virginia: Morgan Kaufmann Publishers, Inc.
Cimiano P., Ladwig, G., and Staab. 2005. “Gimme’ The Context: Context-driven Automatic Semantic Annotation with C-PANKOW” In Proceedings of the 14th World Wide Web Conference (WWW).
https://doi.org/10.1145/1060745.1060796
Etizioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A. M., Shaked, T., Soderland, S., Weld, D., and Yates A. 2004. “Web-scale information extraction in KnowItAll” In Proceedings of the 13th World Wide Web Conference (WWW-04), pp. 100-110.
https://doi.org/10.1145/988672.988687
Faria, C., Girardi, R. and Novais, P. 2012. “Using Domain Specific Generated Rules for Automatic Ontology Population” Proceedings of 12th International Conference on Intelligent Systems Design and Applications.
https://doi.org/10.1109/isda.2012.6416554
Giuliano, C., and Gliozo, A. 2008. “Instance-Based Ontology Population Exploiting Named Entity Substitution” In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008).
https://doi.org/10.3115/1599081.1599115
Imsombut, A. and Paireekreng, W. 2016. “Extract Knowledge for Populating Thai Tourism Ontology from Texts Using Feature-weighted k-Nearest Neighbor” Proceedings of 1st International Conference on Information Technology.
Imsombut, A. and Sirikayon, C. 2016. “An Alternative Technique for Populating Thai Tourism Ontology from Texts Based on Machine Learning” Proceedings of 15th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2016).
https://doi.org/10.1109/ICIS.2016.7550762
J.M. Ruiz-Martínez, J.A. Miñarro-Giménez, D. Castellanos-Nieves, F. García-Sánchez, R. Valencia-Garcia. 2011. “Ontology Population: An Application for the e-tourism Domain” International Journal of Innovative Computing, Information and Control (IJICIC), 7 (11) (2011), pp. 6115–6134.
Lafferty, J., A. McCallum, and F. Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Sequence Data” Proceeding of 18thICML. San Francisco.
Nanba, H., Taguma, H., Ozaki, T., Kobayashi, D., Ishino, A. & Takezawa, T. 2009. “Automatic Compilation of Travel Information from Automatically Identified Travel Blogs” Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP).
Van Rijsbergen, C. J. 1979. “Information Retrieval” Butterworth, 2nd edition.
Vivencio, D.P., Hruschka Jr., E.R., Nicoletti, M., Santos, E., Galvao, S. 2007. “Feature-weighted k-nearest neigbor classifier” In Proceedings of IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007.
Yihao Zhang, Jianyi Guo, YU Zhengtao, Yao Xianming, Zhang Shaomin. 2009. “Automatic Entity Relation Extraction for the Field of Tourism” Journal of Computational Information System, vol. 5, no. 6, pp. 1653-1659.
Zhang, Z. and Ciravegna, F. 2011. “Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia” in Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, IGI Global.