2012年3月29日星期四

sequencing extremophile - sequence the plant living in extrem climates


Life at the extremelessons from the genome


让人想起来西藏的高原植物。



some useful web sites for plant genomic studies

1. phytozome
2. plaza
3. CoGe
4. PlantGDB

a good lapply() tutorial



Using lapply() to Change Multiple Dataframes with One Call


# a dataframe
a <- data.frame(x = 1:3, y = 4:6)
 
# make a list of several dataframes, then apply function (change column names, e.g.):
my.list <- list(a, a)
my.list <- lapply(my.list, function(x) {names(x) <- c("a", "b") ; return(x)})
 
# save dfs to csv with similar lapply-call:
n <- 1:length(my.list)
lapply(n, function(ni) {
               write.table(file = paste(c(1:length(my.list))[ni], ".csv", sep = ""), 
               my.list[ni], sep = ";", row.names = F)
               }
       )

List of sequenced eukaryotic genomes - from wikipedia

http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes

2012年3月28日星期三

i-ADHoRe - a comparative genomic tools

to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity)



bioinformatics and systems biology in Gent U

This division incorporates several excellent researchers.

1. bioinformatics and evolutionary genomics
2. microbial systems biology
3. evolutionary systems biology
4. comparative and integrative genomics

2012年3月27日星期二

Bayesian phylogenetics with BEAUti and the BEAST 1.7


Bayesian phylogenetics with BEAUti and the BEAST 1.7




Heterogeneity of the Transition/Transversion Ratio in Drosophila and Hominidae Genomes

Heterogeneity of the Transition/Transversion Ratio in Drosophila and Hominidae Genomes


1001 Proteomes of Arabidopsis thaliana


1001 ProteomesA functional proteomics portal for the analysis of Arabidopsis thaliana accessions

network and integrating biological data


Network biology methods integrating biological data for translational science

jPopGen Suite: population genetic analysis of DNA polymorphism from nucleotide sequences with errors

jPopGen Suitepopulation genetic analysis of DNA polymorphism from nucleotide sequences with errors

https://sites.google.com/site/jpopgen/

human evolution - three ref

1. Report on the symposium on Modern Human Genetic Variation

2. http://kva.se/en/Events-List/Event/?eventId=362

3. The agricultural “express train”

Human Evolution Out of Africa: The Role of Refugia and Climate Change


Human Evolution Out of Africa: The Role of Refugia and Climate Change



2012年3月26日星期一

Introduction to text manipulation on UNIX-based systems

http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html

NGS-SNP - a good tools for SNP annotation

In-depth annotation of SNPs arising from resequencing projects using NGS-SNP


NGS-SNP - Overview

Citing NGS-SNP

Grant JR, Arantes AS, Xiaoping L, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

Description

NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcriptsor whole genomes from organisms with reference sequences in EnsemblIncluded among the annotationsseveral of which arenot available from any existing SNP annotation toolsare the results of detailed comparisons with orthologous sequencesThesecomparisons allowfor exampleSNPs to be sorted or filtered based on how drastically the SNP changes the score of a proteinalignmentOther fields indicate the names of overlapping protein domains or featuresand the conservation of both the SNP siteand flanking regionsNCBIEnsembland Uniprot IDs are provided for genestranscriptsand proteins when applicablealongwith Gene Ontology termsa gene descriptionphenotypes linked to the geneand an indication of whether the SNP is novel orknownA “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologousgenetypically in a well-characterized species.

NGS-SNP scripts

  • annotate_SNPs.pl - used to annotate SNPs identified by the sequencing of genomic DNA or transcripts.
  • merge_and_sort_SNP_lists.pl - used to filter, merge, and sort SNP lists annotated using NGS-SNP.
  • cDNA_library_entropy.pl - used to choose the best tissues for SNP discovery by mRNA sequencing.
  • obtain_reference_chromosomes.pl - used to obtain reference chromosome sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • obtain_reference_transcripts.pl - used to obtain reference transcript sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • get_genes_in_area.pl - used to obtain information about genes located within or nearby CNVs or other variants supplied as input.
  • ncbi_monitor.pl - used to obtain publications related to genome regions supplied as input.

Using NGS-SNP

  1. Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
  2. Obtain a list of SNPs from SAMtoolsMaq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
  3. Annotate the SNP list using the annotate_SNPs.pl script.
The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):
cd NGS-SNP/scripts

perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
        
For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.

Using a local Ensembl database

NGS Catalog - resources collections for NGS in human

http://bioinfo.mc.vanderbilt.edu/NGS/software.html

How-to Extract Text From Multiple Websites with R


How-to Extract Text From Multiple Websites with R

2012年3月24日星期六

unix and python tutorials from a same author

http://pythonstarter.blogspot.de/

http://unstableme.blogspot.de/

plotGoogleMap - easily plot on GoogleMaps

 plotGoogleMap

genome sequencing for huge genomes


Flow cytometric chromosome sorting in plantsThe next generation

ABEL - R packages for genetic studies

http://www.genabel.org/tutorials/ABEL-tutorial


Packages

GenABEL, or *ABEL, is an umbrella name for a number of software packages aiming to facilitate statistical analyses of polymorphic genomes data. This is reach program set which now allows very flexible genome-wide association (GWA) analysis (GenABELProbABELMixABEL), meta-analysis (MetABEL), parallelization of GWA analyses (ParallABEL), management of very large files (DatABEL), and facilitates evaluation of prediction (PredictABEL).
Most likely, you only need one of the packages for your specific task. Figure out which one you need, install, and use! If you have questions, please refer to the "Support" section of this web-site.
The code for latest development versions of all packages are available from GenABEL on R-forge
For stable releases, use CRAN version for R packages or links provided at this web-site

GenABEL

Genome-wide association analysis for quantitative, binary and time-till-event traits

MetABEL

Meta-analysis of genome-wide SNP association results Genome-wide association analysis for quantitative, binary and time-till-event traits

ProbABEL

Genome-wide association analysis of imputed data

PredictABEL

Assess the performance of risk models for binary outcomes

DatABEL

File-based access to large matrices stored on HDD in binary format

ParallABEL

Generalized parallelization of Genome-Wide Association Studies

MixABEL

More mixed models for genome-wide association analysis; experimenting with GSL, multiple input formats, iterator, parallelization through threads.

VariABEL

Genome-wide variance heterogeneity analysis as a tool for identification of potentially interacting SNPs.

Plotting grouped data vs time with error bars in R


Plotting grouped data vs time with error bars in R

2012年3月23日星期五

Biological Data Modelling and Scripting in R

Biological Data Modelling and Scripting in R

workshop materials from Montreal R user group

https://sites.google.com/site/mcgillbgsa/

意味着- imply

Interestinglyincreased methylation of the hybrid genomes predominantly occurred in regions that were differentially methylated in the two parents and covered by small RNAsimplying that the RNA-directed DNA methylation (RdDMpathway may direct DNA methylation in hybrids.

施加影响- exert influence by doing sth.

They exert their influence by guiding mRNA cleavagetranslational repressionand chromatin modification.

2012年3月22日星期四

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts

http://genomebiology.com/2011/12/8/R72

http://tophat-fusion.sourceforge.net/index.html

model Based CpG Islands for multiple species - R package

makeCGI is an R software package to obtain CGI from a genomeIt fits two HMMs on GC content and observed to expected CpG ratio iteratively and obtain posterior probabilities for genomic regions being CpG islandsThe CpG islands are then defined by thresholding the posterior probabilities.


The software package can be downloaded from here. It depends on BSgenome and Biostrings BioConductor packagesThe input DNA sequence can beeither a BSgenome package or text file in fa formatFollow below steps to use the package


http://rafalab.jhsph.edu/CGI/index.html



export SVG plot from R


RSvgDevice

install.packages(“RSvgDevice”) #one time

library(RSvgDevice)

devSVG(file = “testRplots.svg”, width = 10, height = 8,
            bg = “white”, fg = “black”, onefile=TRUE, xmlHeader=TRUE)
plot(1:11,(-5:5)^2, type=’b', main=”Simple Example Plot”)
dev.off()

Time Series Analysis and Its Applications: With R Examples


Time Series Analysis and Its ApplicationsWith R Examples

introduction to plot options in R

http://isomorphismes.tumblr.com/post/19688088245/base-plot-r-options-par

nice spatial plot

http://stevemosher.wordpress.com/2012/03/22/metadata-dubuque-and-uhi/




blogs

http://bioinfoblog.it/

http://kbroman.wordpress.com/

http://sunshinehours.wordpress.com/

http://stevemosher.wordpress.com/

http://www.rcasts.com/

http://davenportspatialanalytics.squarespace.com/blog/

http://blogs.discovermagazine.com/gnxp/

2012年3月21日星期三

an excellent evolutionary blog

http://bioinfoblog.it/

trello - tools for you to organize your project

https://trello.com/board/my-ideas/4f69eb9588299a057b03c7a6

Introduction to Unix systems for Evolutionary Biologists


Introduction to Unix systems for Evolutionary Biologists

find bugs from your R scripts


R语言编程入门之七:程序查错

automatically detect and efficiently remove sequence condaminations from genomic and metagenomic datasets

http://deconseq.sourceforge.net/manual.html

accessions geographic locations - Arabidopsis thaliana

You could find these INFO in 149SNPFiltered folder, from this web page.

Genetics and Genomics of the Brassicaceae


Genetics and Genomics of the Brassicaceae


Brassicaceae in Agriculture
67-121

The Non-coding Landscape of the Genome of Arabidopsis thaliana
123-151

Natural Variation in Arabidopsis thaliana
153-170

Chasing Ghosts: Comparative Mapping in the Brassicaceae
171-194

Comparative Genome Analysis at the Sequence Level in the Brassicaceae
195-214
Structural and Functional Evolution of Resynthesized Polyploids
215-260

Genetics of Brassica rapa L.
261-289

The Genetics of Brassica oleracea
291-322

The Genetics of Brassica napus
323-345

Genetics of Brassica juncea
347-372

Arabidopsis lyrata Genetics
373-387

The Genetics of Capsella
389-411

Self-Incompatibility in the Brassicaceae
413-435

Sequencing the Gene Space of Brassica rapa
437-467

Germplasm and Molecular Resources
469-503

Resources for Metabolomics
505-525

Transformation Technology in the Brassicaceae
527-560

Resources for Reverse Genetics Approaches in Arabidopsis thaliana
561-583

Resources for Reverse Genetics Approaches in Brassica Species
585-596

Bioinformatics Resources for Arabidopsis thaliana
597-615

Bioinformatics Resources for the Brassica Species
617-632

Perspectives on Genetics and Genomics of the Brassicaceae
633-677

Back matter





check Disk Usage and Free Space - unix command

http://unix-school.blogspot.de/2012/03/disk-usage-and-free-space-housekeeping.html

Join every 2 lines in a file - unix commands

http://unix-school.blogspot.de/2012/03/join-every-2-lines-in-file.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+http/unix-schoolblogspotcom+(The+UNIX+School)

2012年3月19日星期一

瞄准某个问题- bear on these questions

We bring previously overlooked fossil evidence to bear on these questions and find the split between A. thaliana and Arabidopsis lyrata occurred about 13 Mya, and that the split between Arabidopsis and the Brassica complex (broccoli, cabbage, canola) occurred about 43 Mya.

some on-line resource for R

1. http://personality-project.org/r/r.guide.html#manipulate
Using R for psychological research

2. http://biostat.mc.vanderbilt.edu/s/finder/finder.html
R/S functions finder

3. http://www.psych.upenn.edu/~baron/rpsych/rpsych.htm
Notes on the use of R for psychology experiments and questionnaires

Arabidopsis thaliana - genome level variation studies

1. Whole-genome sequencing of multiple Arabidopsis thaliana populations
Whole genome, 80 individuals

2. Multiple reference genomes and transcriptomes forArabidopsis thaliana
    http://mus.well.ox.ac.uk/19genomes/
whole gneome, 19 individuals

3. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel
    http://bergelson.uchicago.edu/regmap-data/regmap.html/
250k SNPs, >1000 individuals


4. The Scale of Population Structure in Arabidopsis thaliana
    https://cynin.gmi.oeaw.ac.at/home/resources/atpolydb

149 SSRs, >5000 individuals

5. Source verification of mis-identified Arabidopsis thaliana accessions




Multiple reference genomes and transcriptomes for Arabidopsis thaliana


Multiple reference genomes and transcriptomes forArabidopsis thaliana


a new way of combining population genetics and phylogenetics models of natural selection

A population genetics-phylogenetics approach to inferring natural selection