页面

a good lapply() tutorial



Using lapply() to Change Multiple Dataframes with One Call


# a dataframe
a <- data.frame(x = 1:3, y = 4:6)
 
# make a list of several dataframes, then apply function (change column names, e.g.):
my.list <- list(a, a)
my.list <- lapply(my.list, function(x) {names(x) <- c("a", "b") ; return(x)})
 
# save dfs to csv with similar lapply-call:
n <- 1:length(my.list)
lapply(n, function(ni) {
               write.table(file = paste(c(1:length(my.list))[ni], ".csv", sep = ""), 
               my.list[ni], sep = ";", row.names = F)
               }
       )

2012年3月28日星期三

i-ADHoRe - a comparative genomic tools

to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity)



NGS-SNP - a good tools for SNP annotation

In-depth annotation of SNPs arising from resequencing projects using NGS-SNP


NGS-SNP - Overview

Citing NGS-SNP

Grant JR, Arantes AS, Xiaoping L, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

Description

NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcriptsor whole genomes from organisms with reference sequences in EnsemblIncluded among the annotationsseveral of which arenot available from any existing SNP annotation toolsare the results of detailed comparisons with orthologous sequencesThesecomparisons allowfor exampleSNPs to be sorted or filtered based on how drastically the SNP changes the score of a proteinalignmentOther fields indicate the names of overlapping protein domains or featuresand the conservation of both the SNP siteand flanking regionsNCBIEnsembland Uniprot IDs are provided for genestranscriptsand proteins when applicablealongwith Gene Ontology termsa gene descriptionphenotypes linked to the geneand an indication of whether the SNP is novel orknownA “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologousgenetypically in a well-characterized species.

NGS-SNP scripts

  • annotate_SNPs.pl - used to annotate SNPs identified by the sequencing of genomic DNA or transcripts.
  • merge_and_sort_SNP_lists.pl - used to filter, merge, and sort SNP lists annotated using NGS-SNP.
  • cDNA_library_entropy.pl - used to choose the best tissues for SNP discovery by mRNA sequencing.
  • obtain_reference_chromosomes.pl - used to obtain reference chromosome sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • obtain_reference_transcripts.pl - used to obtain reference transcript sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • get_genes_in_area.pl - used to obtain information about genes located within or nearby CNVs or other variants supplied as input.
  • ncbi_monitor.pl - used to obtain publications related to genome regions supplied as input.

Using NGS-SNP

  1. Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
  2. Obtain a list of SNPs from SAMtoolsMaq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
  3. Annotate the SNP list using the annotate_SNPs.pl script.
The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):
cd NGS-SNP/scripts

perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
        
For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.

Using a local Ensembl database

NGS Catalog - resources collections for NGS in human

http://bioinfo.mc.vanderbilt.edu/NGS/software.html

ABEL - R packages for genetic studies

http://www.genabel.org/tutorials/ABEL-tutorial


Packages

GenABEL, or *ABEL, is an umbrella name for a number of software packages aiming to facilitate statistical analyses of polymorphic genomes data. This is reach program set which now allows very flexible genome-wide association (GWA) analysis (GenABELProbABELMixABEL), meta-analysis (MetABEL), parallelization of GWA analyses (ParallABEL), management of very large files (DatABEL), and facilitates evaluation of prediction (PredictABEL).
Most likely, you only need one of the packages for your specific task. Figure out which one you need, install, and use! If you have questions, please refer to the "Support" section of this web-site.
The code for latest development versions of all packages are available from GenABEL on R-forge
For stable releases, use CRAN version for R packages or links provided at this web-site

GenABEL

Genome-wide association analysis for quantitative, binary and time-till-event traits

MetABEL

Meta-analysis of genome-wide SNP association results Genome-wide association analysis for quantitative, binary and time-till-event traits

ProbABEL

Genome-wide association analysis of imputed data

PredictABEL

Assess the performance of risk models for binary outcomes

DatABEL

File-based access to large matrices stored on HDD in binary format

ParallABEL

Generalized parallelization of Genome-Wide Association Studies

MixABEL

More mixed models for genome-wide association analysis; experimenting with GSL, multiple input formats, iterator, parallelization through threads.

VariABEL

Genome-wide variance heterogeneity analysis as a tool for identification of potentially interacting SNPs.

意味着- imply

Interestinglyincreased methylation of the hybrid genomes predominantly occurred in regions that were differentially methylated in the two parents and covered by small RNAsimplying that the RNA-directed DNA methylation (RdDMpathway may direct DNA methylation in hybrids.

施加影响- exert influence by doing sth.

They exert their influence by guiding mRNA cleavagetranslational repressionand chromatin modification.

model Based CpG Islands for multiple species - R package

makeCGI is an R software package to obtain CGI from a genomeIt fits two HMMs on GC content and observed to expected CpG ratio iteratively and obtain posterior probabilities for genomic regions being CpG islandsThe CpG islands are then defined by thresholding the posterior probabilities.


The software package can be downloaded from here. It depends on BSgenome and Biostrings BioConductor packagesThe input DNA sequence can beeither a BSgenome package or text file in fa formatFollow below steps to use the package


http://rafalab.jhsph.edu/CGI/index.html



export SVG plot from R


RSvgDevice

install.packages(“RSvgDevice”) #one time

library(RSvgDevice)

devSVG(file = “testRplots.svg”, width = 10, height = 8,
            bg = “white”, fg = “black”, onefile=TRUE, xmlHeader=TRUE)
plot(1:11,(-5:5)^2, type=’b', main=”Simple Example Plot”)
dev.off()

accessions geographic locations - Arabidopsis thaliana

You could find these INFO in 149SNPFiltered folder, from this web page.

Genetics and Genomics of the Brassicaceae


Genetics and Genomics of the Brassicaceae


Brassicaceae in Agriculture
67-121

The Non-coding Landscape of the Genome of Arabidopsis thaliana
123-151

Natural Variation in Arabidopsis thaliana
153-170

Chasing Ghosts: Comparative Mapping in the Brassicaceae
171-194

Comparative Genome Analysis at the Sequence Level in the Brassicaceae
195-214
Structural and Functional Evolution of Resynthesized Polyploids
215-260

Genetics of Brassica rapa L.
261-289

The Genetics of Brassica oleracea
291-322

The Genetics of Brassica napus
323-345

Genetics of Brassica juncea
347-372

Arabidopsis lyrata Genetics
373-387

The Genetics of Capsella
389-411

Self-Incompatibility in the Brassicaceae
413-435

Sequencing the Gene Space of Brassica rapa
437-467

Germplasm and Molecular Resources
469-503

Resources for Metabolomics
505-525

Transformation Technology in the Brassicaceae
527-560

Resources for Reverse Genetics Approaches in Arabidopsis thaliana
561-583

Resources for Reverse Genetics Approaches in Brassica Species
585-596

Bioinformatics Resources for Arabidopsis thaliana
597-615

Bioinformatics Resources for the Brassica Species
617-632

Perspectives on Genetics and Genomics of the Brassicaceae
633-677

Back matter





2012年3月19日星期一

瞄准某个问题- bear on these questions

We bring previously overlooked fossil evidence to bear on these questions and find the split between A. thaliana and Arabidopsis lyrata occurred about 13 Mya, and that the split between Arabidopsis and the Brassica complex (broccoli, cabbage, canola) occurred about 43 Mya.

some on-line resource for R

1. http://personality-project.org/r/r.guide.html#manipulate
Using R for psychological research

2. http://biostat.mc.vanderbilt.edu/s/finder/finder.html
R/S functions finder

3. http://www.psych.upenn.edu/~baron/rpsych/rpsych.htm
Notes on the use of R for psychology experiments and questionnaires