PERL

NCSAstore.pl

Description:
This is a script to put on the end of your NCSA batch files, to do parallel uploads to mass storage at the end of your runs. It should be possible to adjust this script to work with other supercomputers too.
Sample usage is
# Designate storage directory on MSS
set STORAGE_DIR="Runs/Flash25/bhl/exe01"
# Make the file list
\ls -1 --hide-control-chars *hdf* *log flash.dat > files.txt

Author's Email:
rge21@pas.rochester.edu

Author's Full Name:
Richard Edgar

Author's Homepage:
http://www.pas.rochester.edu/~rge21/computing/programs/ncsastore/index.shtml

Script File:
NCSAstore.pl (2.8 KB)

EnTuned.pl

Description:
Entuned.pl reads a newline delimited list of Ensembl numbers from a text file, accesses the ensembl.org website to find the corresponding Entrez and UniGene numbers. the output of the program is a comma delimited text file containing 7 fields (ensembl.org URL for the specific ensemble number, the ensemble number, the URL for the Entrez number, the Entrez number, the URL for the UniGene number, the UniGene number, the description of the gene from the ensembl.org web page). This program converts Enemble numbers to Entrez and UniGene numbers.

Author's Email:
paul_a_wilson@mac.com

Author's Full Name:
Paul A. Wilson, Ph.D., if a title must be used, author prefers motorcyclist over doctor

Author's Homepage:
homepage.mac.com/paul_a_wilson

Script File:
EnTuned.pl.tar.gz (2.74 KB)

alpha.pl: translate Excel column headers into numerical rank

Description:
alpha.pl:
Translates alphabetical string of one or two letters into a numerical value using modulo-26 addition; use to find column order when parsing very wide Excel input.
Usage:
alpha.pl /excel column header/
alpha.pl AB

Author's Homepage:

Script File:
alpha.pl.gz (571 bytes)

Dinucleotide shuffle with Altschul&Erickson Algorithm

Description:
This script is an implementation of the Altschul&Erickson algorithm for exact dinucleotide shuffling.
The following modules should be intalled: Graph, Bio::DB::Fasta, Bio::Seq, Bio::SeqIO. all of them are available at CPAN (http://search.cpan.org)

Author's Full Name:
Diego Mauricio Riaño-Pachón

Author's Homepage:
http://www.geocities.com/dmrp.geo

Script File:
dishuffleseq.pl.gz (3.84 KB)

combine/permute a list from a file or pipe

Description:
combo -[pc]
Perform combinatoric transformations on a list of elements
separated by newline. Input may be a filename, or '-' to
read from STDIN. Combinations/Permutations are written to
STDOUT, one per line, with elements separated by tab.
Options:
-p permute list; this is the default behavior
-c combine list; this parameter requires an integer value
for how many of the list elements should be included
in the combination

Author's Email:
allenday@ucla.edu

Author's Full Name:
Allen Day

Author's Homepage:
http://search.cpan.org/~allenday

Script File:
combo.gz (746 bytes)

extract_genes.pl

Description:
extract_genes.pl - extract genomic sequences from NCBI files using BioPerl. This script is a simple solution to the problem of
extracting genomic regions corresponding to genes. There are other solutions, this particular approach uses genomic sequence
files from NCBI and gene coordinates from Entrez Gene.

Author's Email:
osborne1@optonline.net

Author's Full Name:
Brian Osborne

Author's Homepage:
http://bioperl.org

Script File:
extract_genes.pl.zip (1.76 KB)

Updates NCBI Blast Databases (e.g. for cron job)

Description:
fetch_ncbi_db.pl is a script I wrote to automatically update the blast databases from NCBI. We regularly need to make sure the databases are up to date, so we set up a weekly cron job to download them. It does not check if the files have been updated before downloading them, since the databases we use are updated very regularly.

Author's Full Name:
Alexander Richter

Author's Homepage:

Script File:
fetch_ncbi_db.pl.gz (1 KB)

Shanon-Weiner Calculator

Description:
Give this script a data matrix in a comma separated document of abundances of different species (each species is a column, each plot is a row), and it will output a file which will not only have your original data, but also an additional column for total abundance of all species, and the Shanon-Weiner Diversity Index.

Author's Homepage:

Script File:
shanon.pl.gz (228 bytes)

dp.pl

Description:
This perl script reads a text file containing a list of PDB ID's. The script downloads the fasta formatted sequence file corresponding to the PDB ID's in the test file from the RCSB website. The domain information corresponding to the PDB ID's is downloaded from the NCBI MMDB website. PSIPRED and the domain prediction program PPRODO is run on each PDB ID. The results of the domain prediction and the domains information from MMDB are written to a text file. This script was written for testing domain prediction software. It may serve as a good example for writing similar scripts. PSIPRED, PPRODO, and a network connection is required be this script. PPRODO can be found at http://gene.kias.re.kr/~jlee/pprodo/

Author's Email:
paul_a_wilson@mac.com

Author's Full Name:
Paul A. Wilson

Author's Homepage:
http://homepage.mac.com/paul_a_wilson/

Script File:
dp.pl.tar.gz (4.55 KB)