Search
2018 Volume 33
Article Contents
RESEARCH ARTICLE   Open Access    

A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification

More Information
  • Abstract: Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods.
  • 加载中
  • Blitzer J., Dredze M. & Pereira F. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the ACL, 440–447.

    Google Scholar

    Blitzer J., McDonald R. & Pereira F. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the EMNLP, 120–128.

    Google Scholar

    Bollegala D., Mu T. & Goulermas J. Y. 2015. Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Transactions on Knowledge and Data Engineering 28(2), 398–410.

    Google Scholar

    Bollegala D., Weir D. & Carroll J. 2011. Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. In Proceedings of the ACL, 132–141.

    Google Scholar

    Bollegala D., Weir D. & Carroll J. 2014. Learning to predict distributions of words across domains. In Proceedings of the ACL, 613–623.

    Google Scholar

    Church K. W. & Hanks P. 1990. Word association norms, mutual information, and lexicography’. Computational Linguistics 16(1), 22–29.

    Google Scholar

    Jiang J. & Zhai C. 2007. Instance weighting for domain adaptation in nlp. In Proceedings of the ACL, 264–271.

    Google Scholar

    Koehn P. & Schroeder J. 2007. Experiments in domain adaptation for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, 224–227.

    Google Scholar

    Kübler S. & Baucom E. 2011. Fast domain adaptation for part of speech tagging for dialogues. In Proceedings of the RANLP, 41–48.

    Google Scholar

    Li S. & Zong C. 2008. Multi-domain sentiment classification. In ACL 2008 (short papers), 257–260.

    Google Scholar

    Liu Y. & Zhang Y. 2012. Unsupervised domain adaptation for joint segmentation and POS-tagging. In Proceedings of the COLING, 745–754.

    Google Scholar

    Manning C. D. & Schütze H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

    Google Scholar

    Mansour R. H., Refaei N., Gamon M., Sami K. & Abdel-Hamid A. 2013. Revisiting the old kitchen sink: do we need sentiment domain adaptation? In Proceedings of the RANLP, 420–427.

    Google Scholar

    Pan S. J., Ni X., Sun J.-T., Yang Q. & Chen Z. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of WWW, 751–760.

    Google Scholar

    Pang B., Lee L. & Vaithyanathan S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the EMNLP, 79–86.

    Google Scholar

    Schnabel T. & Schütze H. 2013. Towards robust cross-domain domain adaptation for part-of-speech tagging. In Proceedings of the IJCNLP, 198–206.

    Google Scholar

    Turney P. 2006. Similarity of semantic relations. Computational Linguistics 32(3), 379–416.

    Google Scholar

    Turney P. D. 2001. Minning the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the ECML-2001, 491–502.

    Google Scholar

    Yu J. & Jiang J. 2015. A hassle-free unsupervised domain adaptation method using instance similarity features. In Proceedings of the ACL-IJCNLP, 168–173.

    Google Scholar

    Zhang Y., Xu X. & Hu X. 2015. A common subspace construction method in cross-domain sentiment classification. In Procedings of International Conference on Electronic Science and Automation Control (ESAC), 48–52.

    Google Scholar

  • Cite this article

    Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000085
    Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000085

Article Metrics

Article views(37) PDF downloads(25)

RESEARCH ARTICLE   Open Access    

A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification

Abstract: Abstract: Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods.

    • The authors would like to thank all the anonymous reviewers, and the support from editors.

    • Note that the original proposal by Blitzer et al. (2007) was to use mutual information with source domain labelled data as we discuss later in Section 3.2. However, for comparison purposes we define a pivothood score based on frequency and source domain labelled data here.

    • http://www.cs.jhu.edu/ mdredze/datasets/sentiment/

    • © Cambridge University Press, 2018 2018Cambridge University Press
References (20)
  • About this article
    Cite this article
    Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000085
    Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review 33(1), doi: 10.1017/S0269888918000085
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return