Novel high throughput sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences in a single experiment and with a much lower cost than previous methods. In this paper, we address the problem of efficiently mapping and classifying millions of degenerate and weighted sequences to a reference genome, based on whether they occur exactly once in the genome or not, and by taking into consideration probability scores. In particular, we design parallel algorithms for Massive Exact and Approximate Unique Pattern Matching for degenerate and weighted sequences derived from high throughput sequencing technologies.
Prague Stringology Conference 2009 (PSC 2009). Proceedings of the Prague Stringology Conference 2009 (Prague, Czech Republic 31 August - 2 September, 2009) p. 249-262