ALSE - Motif Finding Tool
The University of Hong Kong
Department of Computer Science, The University of Hong Kong
Useful Links
 

ALSE - An Introduction

   

Command Line Version of ALSE

 

Downloading and Compiling the source code

ALSE is written in C++, it can be compiled with the GCC C++ compiler version 3.00 or above. You can get the source code from the ALSE webpage. The file is in the format ALSE_v????.tar.gz where v???? is the version number.
Unix prompt is shown as "$". Un-tar the file and compile the ALSE by doing:

$ tar zxvf ALSE_v???.tar.gz
$ cd ALSE
$ make clean
$ make

If everything is smooth, you should get a executable file "find_motif".

Running ALSE

SYNOPSIS

    ./find_motif [OPTIONS] -t <true seq file> -f <false seq file>

where

  1.  <true seq file> is the full path name of the sequence file that is known to contain the motif(s). It is in the FASTA format.
  2.  <false seq file> - sequence data file that consist of the background sequences, also in FASTA format.
  3.  [OPTIONS] fields specify the parameters used for running ALSE. They can be any of the following, appeared in any order. When a particular option is not set or not specified in the command line, the default value is used in ALSE.
         -m, --motif
               Length of the motif to be discovered
               (default 6)

    -r, --no_reverse
               do not test the reversed sequence

    -O, --output
               output HTML filename
               (default ./output.html)

    -b, --binding
               Specify the maximum number of motifs in a sequence
               (default 2)

    -s, --seeds
               number of seeds used for s tuning iteration
               (default 100)

    -h, --help
               Disply the command and options information
 

Web Version of ALSE

 
 
 

FASTA sequence data Format

 
The FASTA file format is used to contain multiple DNA (protein) sequences. It begins with a character > followed by the name of the sequence; the sequence data follow in the next line. More sequences are listed in the file in the similar way.
[FASTA] := [SEQUENCES]
[SEQUENCES] := [SEQUENCE]
[SEQUENCES]
[SEQUENCE] := >[SEQUENCE-NAME]
[SEQUENCE-DATA]
[SEQUENCE-DATA] := [DNA_CHARACTER] [SEQUENCE-DATA]
[SEQUENCE-NAME] := A|B|...|Z | a|b|...|z | 0|1|...|9 | _ | - 
[DNA_CHARACTER] := A|C|G|T
More information can be found here
 

Output Format

 
The output of the ALSE is a summary of motifs discovered in HTML format. A sample of the output can be found here. The page has mainly 3 sections.

  1. Program parameters
    Here shows information and of the input data, including the options issued and the names pf input sequences and total execution time.

  2. List of motifs
    Here outputs a list of the found motifs in order of the p-values. Each motif's p-value and alpha are also displayed
    e.g.

  3. Detail for each motif
    e.g.


 

Introduction Software Download Online Services Algorithm Used Useful Links Email: alse@cs.hku.hk