Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances
DOI:
https://doi.org/10.6000/1929-7092.2017.06.39Keywords:
Ontology Enrichment, Statistical Technique, Classification, Conditional Random Fields (CRFs), Feature-weighted k-Nearest NeighborAbstract
Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification.References
Carlson, A., J. Betteridge, R. Wang, Jr. E. Hruschka and T. Mitchell. 2010. “Coupled Semi-Supervised Learning for Information Extraction” In Proceedings of the third ACM International Conference on Web Search and Data Mining (WSDM ’10).
https://doi.org/10.1145/1718487.1718501
Chinchor, N. 1998. “MUC-7 named entity task definition dry run version, version 3.5” Proceedings of the Seventh Message Understanding Conference (MUC-7) (to appear). Fairfax,Virginia: Morgan Kaufmann Publishers, Inc.
Cimiano P., Ladwig, G., and Staab. 2005. “Gimme’ The Context: Context-driven Automatic Semantic Annotation with C-PANKOW” In Proceedings of the 14th World Wide Web Conference (WWW).
https://doi.org/10.1145/1060745.1060796
Etizioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A. M., Shaked, T., Soderland, S., Weld, D., and Yates A. 2004. “Web-scale information extraction in KnowItAll” In Proceedings of the 13th World Wide Web Conference (WWW-04), pp. 100-110.
https://doi.org/10.1145/988672.988687
Faria, C., Girardi, R. and Novais, P. 2012. “Using Domain Specific Generated Rules for Automatic Ontology Population” Proceedings of 12th International Conference on Intelligent Systems Design and Applications.
https://doi.org/10.1109/isda.2012.6416554
Giuliano, C., and Gliozo, A. 2008. “Instance-Based Ontology Population Exploiting Named Entity Substitution” In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008).
https://doi.org/10.3115/1599081.1599115
Imsombut, A. and Paireekreng, W. 2016. “Extract Knowledge for Populating Thai Tourism Ontology from Texts Using Feature-weighted k-Nearest Neighbor” Proceedings of 1st International Conference on Information Technology.
Imsombut, A. and Sirikayon, C. 2016. “An Alternative Technique for Populating Thai Tourism Ontology from Texts Based on Machine Learning” Proceedings of 15th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2016).
https://doi.org/10.1109/ICIS.2016.7550762
J.M. Ruiz-Martínez, J.A. Miñarro-Giménez, D. Castellanos-Nieves, F. García-Sánchez, R. Valencia-Garcia. 2011. “Ontology Population: An Application for the e-tourism Domain” International Journal of Innovative Computing, Information and Control (IJICIC), 7 (11) (2011), pp. 6115–6134.
Lafferty, J., A. McCallum, and F. Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Sequence Data” Proceeding of 18thICML. San Francisco.
Nanba, H., Taguma, H., Ozaki, T., Kobayashi, D., Ishino, A. & Takezawa, T. 2009. “Automatic Compilation of Travel Information from Automatically Identified Travel Blogs” Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP).
Van Rijsbergen, C. J. 1979. “Information Retrieval” Butterworth, 2nd edition.
Vivencio, D.P., Hruschka Jr., E.R., Nicoletti, M., Santos, E., Galvao, S. 2007. “Feature-weighted k-nearest neigbor classifier” In Proceedings of IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007.
Yihao Zhang, Jianyi Guo, YU Zhengtao, Yao Xianming, Zhang Shaomin. 2009. “Automatic Entity Relation Extraction for the Field of Tourism” Journal of Computational Information System, vol. 5, no. 6, pp. 1653-1659.
Zhang, Z. and Ciravegna, F. 2011. “Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia” in Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, IGI Global.
https://doi.org/10.1145/1718487.1718501
Chinchor, N. 1998. “MUC-7 named entity task definition dry run version, version 3.5” Proceedings of the Seventh Message Understanding Conference (MUC-7) (to appear). Fairfax,Virginia: Morgan Kaufmann Publishers, Inc.
Cimiano P., Ladwig, G., and Staab. 2005. “Gimme’ The Context: Context-driven Automatic Semantic Annotation with C-PANKOW” In Proceedings of the 14th World Wide Web Conference (WWW).
https://doi.org/10.1145/1060745.1060796
Etizioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A. M., Shaked, T., Soderland, S., Weld, D., and Yates A. 2004. “Web-scale information extraction in KnowItAll” In Proceedings of the 13th World Wide Web Conference (WWW-04), pp. 100-110.
https://doi.org/10.1145/988672.988687
Faria, C., Girardi, R. and Novais, P. 2012. “Using Domain Specific Generated Rules for Automatic Ontology Population” Proceedings of 12th International Conference on Intelligent Systems Design and Applications.
https://doi.org/10.1109/isda.2012.6416554
Giuliano, C., and Gliozo, A. 2008. “Instance-Based Ontology Population Exploiting Named Entity Substitution” In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008).
https://doi.org/10.3115/1599081.1599115
Imsombut, A. and Paireekreng, W. 2016. “Extract Knowledge for Populating Thai Tourism Ontology from Texts Using Feature-weighted k-Nearest Neighbor” Proceedings of 1st International Conference on Information Technology.
Imsombut, A. and Sirikayon, C. 2016. “An Alternative Technique for Populating Thai Tourism Ontology from Texts Based on Machine Learning” Proceedings of 15th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2016).
https://doi.org/10.1109/ICIS.2016.7550762
J.M. Ruiz-Martínez, J.A. Miñarro-Giménez, D. Castellanos-Nieves, F. García-Sánchez, R. Valencia-Garcia. 2011. “Ontology Population: An Application for the e-tourism Domain” International Journal of Innovative Computing, Information and Control (IJICIC), 7 (11) (2011), pp. 6115–6134.
Lafferty, J., A. McCallum, and F. Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Sequence Data” Proceeding of 18thICML. San Francisco.
Nanba, H., Taguma, H., Ozaki, T., Kobayashi, D., Ishino, A. & Takezawa, T. 2009. “Automatic Compilation of Travel Information from Automatically Identified Travel Blogs” Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP).
Van Rijsbergen, C. J. 1979. “Information Retrieval” Butterworth, 2nd edition.
Vivencio, D.P., Hruschka Jr., E.R., Nicoletti, M., Santos, E., Galvao, S. 2007. “Feature-weighted k-nearest neigbor classifier” In Proceedings of IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007.
Yihao Zhang, Jianyi Guo, YU Zhengtao, Yao Xianming, Zhang Shaomin. 2009. “Automatic Entity Relation Extraction for the Field of Tourism” Journal of Computational Information System, vol. 5, no. 6, pp. 1653-1659.
Zhang, Z. and Ciravegna, F. 2011. “Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia” in Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, IGI Global.
Downloads
Published
2017-06-09
How to Cite
Imsombut, A., & Kajornrit, J. (2017). Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances. Journal of Reviews on Global Economics, 6, 375–379. https://doi.org/10.6000/1929-7092.2017.06.39
Issue
Section
Special Issue - Recent Topical Research on Global, Energy, Health & Medical, and Tourism Economics, and Global Software
License
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .