CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. WWW: http://weizhong-lab.ucsd.edu/cd-hit/