Chinese Journal of Computers (计算机学报) 2013/36:12 PP.2545-2559
Locating the transcription factor binding sites (TFBS), motif discovery, are crucial for understanding the gene regulatory relationship. This paper proposes a novel fixed-position projection refinement algorithm (FPPR) to identify the TFBS of DNA sequences. FPPR clusters DNA data into different subsets through a projection based on the corresponding probabilistic frequency matrix, and filters the subsets with certain information score and complexity which are used as the initial condition for expectation maximum refinement. FPPR achieves the different motif instances distribution in the model OOPS, ZOOPS and TCM by setting the threshold in the fixed-position projection. Meanwhile, FPPR can be extended to a multiple motifs discovery version by using the similarity function WIC. Experiments on the real datasets demonstrate our algorithm finds real motifs accurately in a proper time. Comparing with MEME, GAME, Motif Sampler and GALP-F, FPPR has the better performance, and it can solve the multiple motifs discovery effectively.