A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification

Xia Cui; Noor Al-Bazzaz; Danushka Bollegala; Frans Coenen; Xia Cui; Noor Al-Bazzaz; Danushka Bollegala; Frans Coenen

doi:10.1017/S0269888918000085

Abstract: Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods.

Other Articles By Authors

A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification

Department of Computer Science

Published online: 27 June 2018

Abstract: Abstract: Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods.

HTML

Acknowledgments

The authors would like to thank all the anonymous reviewers, and the support from editors.

Note that the original proposal by Blitzer et al. (2007) was to use mutual information with source domain labelled data as we discuss later in Section 3.2. However, for comparison purposes we define a pivothood score based on frequency and source domain labelled data here.

http://www.cs.jhu.edu/ mdredze/datasets/sentiment/

Rights and permissions

References (20)

About this article

Cite this article

Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review. 33:85 doi: 10.1017/S0269888918000085

Xia Cui, Noor Al-Bazzaz, Danushka Bollegala, Frans Coenen. 2018. A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification. The Knowledge Engineering Review. 33:85 doi: 10.1017/S0269888918000085

{{lists.name}}

A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors