cartographic plot of population genetic data and other numerical characters


MDStudio2 is a program for cartographic analysis of population genetic data and other numerical characters distributed in geographic space. The basis of the analysis consists of methods of interpolation and design of contiguous geographical fields as well as methods of mathematical transformation and statistic analysis of geographic distributions. 


colors in R


pairwise sequence alignment with R


Finding Data on the Internet


What I would like is a nice list of all of credible sources on the Internet for finding data to use with R projects. I know that this is a crazy idea, not well formulated (what are data after all) and loaded with absurd computational and theoretical challenges. (Why can't I just google "data R" and get what I want?) So, what can I do? As many people are also out there doing, I can begin to make lists (in many cases lists of lists) on a platform that is stable enough to survive and grow, and perhaps encourage others to help with the effort.
Here follows a list of data sources that may easily be imported into R.
If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. (See http://www.quantmod.com/examples/intro/ for some code.) Otherwise, i have limited the list to data sources for which there is a reasonably simple process for importing csv files. What follows is a list of data sources organized into categories that are not mutually exclusive but which reflect what's out there.


UMD:: http://inforumweb.umd.edu/econdata/econdata.html
World bank: http://data.worldbank.org/indicator


CBOE Futures Exchange: http://cfe.cboe.com/Data/
Google Finance: http://finance.yahoo.com/ (R)
Google Trends: http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0
St Louis Fed: http://research.stlouisfed.org/fred2/ (R)
NASDAQ: https://data.nasdaq.com/
OANDA: http://www.oanda.com/ (R)
Yahoo Finance: http://finance.yahoo.com/ (R)


Archived national government statistics: http://www.archive-it.org/
Australia: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument
Canada: http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1
DataMarket: http://datamarket.com/
Fed Stats: http://www.fedstats.gov/cgi-bin/A2Z.cgi
Guardian world governments: http://www.guardian.co.uk/world-government-data
London, U.K. data: http://data.london.gov.uk/catalogue
New Zealand: http://www.stats.govt.nz/tools_and_services/tools/TableBuilder/tables-by...
NYC data: http://nycplatform.socrata.com/
OECD: http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html
San Francisco Data sets: http://datasf.org/
U.K. Government Data:http://data.gov.uk/data
United Nations: http://data.un.org/
U.S. Federal Government Agencies: http://www.data.gov/metric
US CDC Public Health datasets: http://www.cdc.gov/nchs/data_access/ftp_data.htm
The World Bank: http://wdronline.worldbank.org/

Machine Learning

Causality Workbench: http://www.causality.inf.ethz.ch/repository.php
Kaggle competition data: http://www.kaggle.com/
KDNuggets competition site: www.kdnuggets.com/datasets/
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
Machine Learning Data Set Repository: http://mldata.org/
Microsoft Research: http://research.microsoft.com/apps/dp/dl/downloads.aspx
Million songs: http://blog.echonest.com/post/3639160982/million-song-dataset
Social Networking: http://www.cs.cmu.edu/~jelsas/data/ancestry.com/

Public Domain Collections

Data360: http://www.data360.org/index.aspx
Datamob.org: http://datamob.org/datasets
Factual: http://www.factual.com/topics/browse
Freebase: http://www.freebase.com/
Google: http://www.google.com/publicdata/directory
infochimps: http://www.infochimps.com/
numbray: http://numbrary.com/
Sample R data sets: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html (R)
SourceForge Research Data: http://www.nd.edu/~oss/Data/data.html
UFO Reports: http://www.nuforc.org/webreports.html
Wikileaks 911 pager intercepts: http://911.wikileaks.org/files/index.html
Stats4Stem.org: R data sets: http://www.stats4stem.org/data-sets.html (R)
The Washington Post List: http://www.washingtonpost.com/wp-srv/metro/data/datapost.html


Agricultural Experiments: http://www.inside-r.org/packages/cran/agridat/docs/agridat (R)
Climate data: http://www.cru.uea.ac.uk/cru/data/temperature/#datter
and ftp://ftp.cmdl.noaa.gov/
Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/
Geo Spatial Data: http://geodacenter.asu.edu/datalist/
Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php
MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/ (R)
Protein structure: http://www.infobiotic.net/PSPbenchmarks/
Public Gene Data: http://www.pubgene.org/
Stanford Microarray Data: http://smd.stanford.edu//

Social Sciences

General Social Survey: http://www3.norc.org/GSS+Website/
ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
UCLA Social Sciences Archive: http://dataarchives.ss.ucla.edu/Home.DataPortals.htm
UPJOHN INST: http://www.upjohn.org/erdc/erdc.html

Time Series

Time Series data Library: http://robjhyndman.com/TSDL/


Carnegie Mellon University Enron email: http://www.cs.cmu.edu/~enron/
Carnegie Mellon University StatLab: http://lib.stat.cmu.edu/datasets/
Carnegie Mellon University JASA data archive: http://lib.stat.cmu.edu/jasadata/
Ohio State University Financial data: http://fisher.osu.edu/fin/osudata.htm
UC Berkeley: http://ucdata.berkeley.edu/
UCLA: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data
UC Riverside Time Series: http://www.cs.ucr.edu/~eamonn/time_series_data/
University of Toronto: http://www.cs.toronto.edu/~delve/data/datasets.html

china's history of 5000 years - in songs



歌诀二 (達拉斯市李嘉清所紀)






歌訣五(1972,1973 年馬來西亞古晉中學劉延森老師)



盤古開天神話傳, 三皇五帝數千年。
炎帝黃帝華夏祖, 堯舜禹王位讓賢。
夏商西周奴隸制, 東周列國變封建。
秦漢統一開疆域, 三國紛爭起戰亂。
西晉東晉南北朝, 隋唐疆域又擴展。
五代十國鬧割據, 宋遼夏金歸大元。
明朝船隊下西洋, 清朝鎖國被破關。
民國內戰加外戰, 人民共和開新篇。














魏晉南北朝 隋唐五代十國
宋元明清 中華民國


see outside of you are allowed - anti-block tools for internet surfing


100 个免费翻墙工具

100 个免费翻墙工具及其使用方法:Sec. of State Hillary Clinton Discusses on Internet Freedom

一、在线代理网站 (66)

这个 Aniscartujo 在线代理适用于电脑和手机。
2、Free Web Proxy
Free Web Proxy 可以观看 YouTube 视频并将它们下载为 MP4 文件。
不能看 YouTube 视频。
不能看 YouTube 视频。
Surfagain.com 可以看 YouTube 视频。
6、Online Sonic
Online Sonic 会将你访问的网站直接转换成法语版本。
免费的 Magaproxy 代理没有弹窗广告。
8、Shield Proxy
9、Psiphon 2
Psiphon 2 需要邀请才能使用,并且网站地址经常更改。
你可以通过 Glype 创建自己的在线代理网站,只需要把它的脚本上传到服务器。
Circumventor 会在网站上提供一个在线代理的链接,如果该链接对应的网站被墙,你可以输入邮箱地址并订阅它的其他在线代理。
CamoList.com 网站有提供 30 多个在线代理,并且全部都可以用来看 YouTube 视频:
  • artclassdrama.com
  • browse007.com
  • browse007.info
  • camo1.info
  • classwork101.com
  • coolkidsonly.org
  • ditchthetests.com
  • downwithitall.com
  • dumbdream.com
  • enoughschool.com
  • erasermaker.com
  • forgotmybooks.com
  • getus.in
  • goodgradesforme.com
  • gumunderthedesk.com
  • gymtimestories.com
  • hiddentunnel.net
  • hidemy.biz
  • letmethruthis.com
  • nobodycanstop.us
  • noclasswork.com
  • noneedhallpass.com
  • nowaytoknow.com
  • plzhidemy.info
  • rebelbrowse.com
  • schoolisgood.com
  • showsomewisdom.com
  • slaptheblock.com
  • sneakmyass.in
  • starscantshine.com
  • studybreakneeded.com
  • studyhardplayharder.com
  • theunblocked.com
  • tothedeans.com
  • tunnel007.com
  • wecantfocus.com
Polysolve.com 网站有提供 18 个在线代理的链接(包括它自己的):
  • Atunnel.com
  • Backfox.com
  • Btunnel.com
  • Calculatepie.com
  • Ctunnel.com
  • Dtunnel.com
  • Englishtunnel.com
  • Geotunnel.com
  • Mathtunnel.com
  • Newbackdoor.com
  • Polysolve.com
  • SafeForWork.net
  • Safehazard.com
  • Safelizard.com
  • USAtunnel.com
  • Vmathpie.com
  • VPNTunnel.net
  • Vtunnel.com
以上 18 个代理的速度还是不错的,但是不能看 YouTube 视频,而且广告较多。
66、Proxy.org (100+)
Proxy.org 网站有提供 100 多个可用的免费在线代理。

二、客户端代理 (11)

  • 自由门(适用于 Windows、 Windows Mobile 以及一些 Java 手机)
  • 逍遥游(类似自由门,由同一网站提供)
  • 动网通(类似自由门,由同一网站提供)
  • 无界 (适用于 Windows)
  • GTunnel (适用于 Windows)
  • Tor (适用于 Windows、Mac、Linux、Android、iOS 以及 Nokia)
  • GappProxy(适用于 Windows 和 Linux)
  • Hyk-proxy (适用于 Windows、Linux 和 Mac)
  • Your Freedom (适用于 Windows 和 Mac)
  • GPass (适用于 Windows)
  • HTTP-Tunnel (适用于 Windows)
在使用以上 11 种代理软件的时候,你需要修改浏览器网络的代理地址,而以下扩展可以让修改的过程变得简单,并且可以只对被墙的网站起作用:

三、VPN (13)

代理工具只是对那些设置了网络连接方式的应用程序生效,并且绝大多数只适用于电脑,而 VPN (Virtual Private Network) 则是对所有程序生效,并且大多数适用于电脑和手机。
a、免费免安装的 VPN (6)
绝大多数不需要安装就可以使用的 VPN 都适用于任何系统的电脑和智能手机:
以上六个 VPN 中,除了 USA IP,其他五个只要设置好服务器地址,接着输入帐号用户名和密码,然后就可以使用了。
b、免费 VPN 客户端 (7)
需要安装的 VPN 客服端大多只适用于电脑:
在以上 7 个 VPN 软件中,Free OpenVPN Service 的适用范围最广。

四、SSH (3)

之前所说,免费的 SSH 并不稳定,但是你仍然可以尝试以下三个:
以上三个 SSH 服务都是中文的。
获得 SSH 帐号之后,你还需要可以启用该帐号的工具,例如适用于 Windows 的 Tunnelier,以及适用于 Mac 的 Issh

五、浏览器 (3)

Firefox、Chrome、IE 以及其他流行的浏览器都不可以直接翻墙,但是 Alkasir 可以。

六、IPv6 (1)

如果你的宽带(例如教育网的)支持 IPv6,那么就可以直接访问那些有 IPv6 地址的被墙网站;否则,你也可以利用一些第三方工具(例如 gogoCLIENT)访问那些网站。
下载并安装好 gogoCLIENT 软件之后,你就可以通过以下三种方法访问 Google、Twitter、YouTube 以及其他支持 IPv6 的被墙网站
1、在网站域名后面添加 .sixxs.org 后缀,例如:通过 Google.com.sixxs.org 访问 Google 网站;
2、使用自动代理服务,即 “proxy.pac” 链接,例如:
3、在 hosts 文件里面添加被墙网站的 IPv6 地址。其中 Windows 用户可以在以下路径找到该文件:
另外,你可以在这个 Google 文档 上找到 Google、YouTube、Twitter 等多个流行被墙网站的 IPv6 地址。
如果没有 IPv6 网络,你也可以在 hosts 文件里面添加那些被墙网站的尚未被墙的 IPv4 地址 (即通常所说的 IP),然后就可以不需要任何的工具直接访问这些网站。另外,你可以通过 OpenDNS 网站轻松地找到某个域名的所有 IPv4 地址。

七、其他 (3)

以上所提到的工具都是用来直接突破网络封锁的,除此之外,你也可以通过以下 3 种方法间接翻墙:
98. Google 网页缓存
如果某个被墙的网站有被 Google 搜录,那么你就可以通过 Google 搜索结果的网页缓存查看该网站。
99、Google 翻译
通过 Google 翻译,你只要输入被墙网页的链接,并选择一种不同的语言,然后你就可以浏览该网页的译文,甚至原文。
100、Google 阅读器
现在很多网站都有提供 RSS Feed,所以你可以通过 Google 阅读器订阅它们。


在以上 100 个免费的翻墙工具里面,很难说哪个最好,因为每一个都随时有可能被墙,所以多一个就多一份保障。

Research Blogging Awards 2010

The Winners

Finalists for each award are listed in random order.

明夷待访录 - Chinese book of Du Contrat Social



50 botanical blogs


Botanical Blogs
These blogs are botany-focused, providing news and information you can use as a student and future plant scientist.
  1. AoB Blog: If you’re planning on being a professional in the field of botany, bookmark this blog now. You’ll get updates about the latest research in the field from the Annals of Botany.
  2. Plant Science Blog: A division of the Biology Blog, this site offers up some of the latest discoveries in plant science, with new posts regularly.
  3. Plants and Botany: This group of plant lovers post pictures, questions and discussions on this blog, which can be a great way for newbies to the field to learn more.
  4. Foothills Fancies: Embrace the wonders of the natural world with this blog, written by naturalist S.L. White.
  5. PAPGREN: The Pacific Agricultural Genetic Resource Network supplies those interested in plant genetic research and discovers with news and helpful articles here.
  6. Berry Go Round: Looking for the best posts and blogs to read about plants? This blog hosts a monthly carnival, with links to a wide range of interesting botanical material.
  7. Blog: Botany: On this blog you’ll find posts from Dr. Robson of The Manitoba Museum. She shares some interesting stories about the plants the museum houses.
  8. Thomas’ Plant-Related Blog: Blogger Thomas loves plants, and on this site you’ll find numerous articles on interesting plant species, from plankton to broccoli.
Use these blogs to learn about some new plant genera and species, some of which are amazingly unique.
  1. Botany Blog: This blog is a must-read for any botany student or plant lover. Readers will find regular posts highlighting one particular plant species– a great way to learn more about the innumerable plant species out there.
  2. Net World Directory Botany Blog: Check back with this blog often to learn more about specific plants as well as their cultivation and use.
  3. Get Your Botany On!: This blog is filled with beautiful images and descriptions of plants as well as some posts touching on important news items in the world of botany.
  4. Plants are the Strangest People: Here you’ll find a plant-loving blogger who works in a garden center, posting on everything from useful plant resources to the care and feeding of houseplants.
  5. Exploring the World of Trees: Want to know more about the tree species of the world? This blog is a good place to start, with posts focusing on specific types of trees accompanied by pretty pictures of them.
Botanists and Plant Experts
Hear from professors, botanists and a wide range of plant experts on these great blogs.
  1. Niches: Here, a Georgia bloggers and plant biologists Wayne and Glenn discuss native plants, habitats and the field at large.
  2. Talking Plants: Jaime Plaza of the Sydney Botanic Gardens shares photos and information on plants on this blog.
  3. The Phytophactor: This blogger shares a love of plants on this site, but also works as a botanist focusing on economic botany, rain forest ecology and plant diversity.
  4. Biofortified: With a wide range of professionals in plant science fields posting to this site, it’s a great read for anyone interested in the genetic manipulation of plants for agricultural purposes.
  5. Invasive Species Weblog: Jennifer Forman Orth, an invasive plant ecologist, shares her expertise on invasive plant problems troubling biologists the world over.
  6. California Botany Blog: Dean William Taylor offers a look at some of his research into flowers and seeds on this blog.
  7. Seeds Aside: Focusing on plant evolution and ecology, this blogger shares what he’s learning in botany with readers on this site.
  8. Botanizing: Larry Hufford, Professor of Biology, posts about his expeditions into the natural world here, with lovely photos to illustrate.
  9. A Digital Botanic Garden: This blog is the home of Phil Gates, a botanist working at Durham University. You’ll find excellent posts on a wide variety of plant life that will help you learn about and marvel at the plant world.
These students share their research and passion for plants though their blogs.
  1. My Growing Passion: Margaret Morgan is studying to get her degree in Biology, with a focus on plant life, but she also just plain has a passion for botanicals. On this site, see posts that reflect both her interest in growing and learning about plants.
  2. James and the Giant Corn: Grad and doctoral students in botany can take a peek into another student’s work through this blog from Berkeley student James.
  3. Moss Plants and More: Read through this blog to learn more about bryology, the study of mosses, from graduate student Jessica.
If you want a blog that focuses on one type of plant, ecosystem or botany subject, these sites are excellent resources.
  1. Cactus Blog: If you prefer your plants pointy and leafless, then check out this cactus-centric blog. You’ll learn about all things succulent and cacti related.
  2. Wild Plants Post: While cultivated plants can be great, this blog chooses to focus on their wild cousins, sharing posts about ecosystems and evolution as well.
  3. My Orchids Journal: Many houseplant enthusiasts love to grow orchids and other tropical flowering plants. Learn more about what it takes to get them growing right from this blog.
  4. Early Forest: Those with a passion for trees and forestry will find this blog, and its amazing photos, a great inspiration.
  5. Treeblog: Can’t get enough of those monsters of the plant world, trees? This blog is full of great information and guidance on how to learn more.
  6. No Seeds, No Fruits, No Flowers: No Problem: With hundreds of different and diverse varieties, ferns have fascinated people for hundreds of years. Learn more about these beautiful plants through this blog.
  7. SwampThings: You’ll get a chance to better understand the ecology of the plants and animals that call the swamp home when you read this blog.
  8. The Plant Mafia Blog: These plant lovers are committed to raising beautiful, healthy plants and share them with each other, readers and local botanical gardens.
Botanic gardens are great places to learn more about plants and to see some rare and beautiful varieties in person. If you don’t live near one or just want a quick botanical fix, visit these blogs instead.
  1. Botany Photo of the Day: The UBC Botanical Garden posts a new photo of the gorgeous plants they care for on this blog every few days.
  2. Plant Talk: This blog from the New York Botanical Garden is a great source of information not only about the gardens but about plant life in general.
  3. Denver Botanic Gardens: Find out just what’s happening in Denver’s Botanic Gardens from this blog.
  4. Lewis Ginter Botanical Garden: See photos of the beautiful greenhouses and gardens at this botanical garden on their blog, and get advice on great books, plant care and much more.
  5. Ogden Botanical Gardens: This blog will inspire your love of plants even more, with information about classes, seasonal bloomers and gardening activities.
  6. The Dig!: Green Bay Botanical Gardens share just what’s happening every season of the year, letting those in colder climes know when its worth it to brave the weather to see beautiful plants.
  7. Norfolk Botanical Garden: With detailed posts about the trees and flowers in their collection, this blog is a great read whether you can visit these gardens or not.
Horticulture and Agriculture
Plants can be wonderful on their own, but much human interaction with plants has to do with using and manipulating them to suit our own needs. That’s where these blogs come in. They’ll teach you the essentials of raising plants for pleasure and for sustenance.
  1. Garden Voices: If you don’t have time to browse multiple blogs, consider this blog that collects some of the best gardening posts from the web.
  2. Garden Rant: You’ll find gardening advice aplenty on this excellent horticultural blog.
  3. Agricultural Biodiversity Weblog: Read through this blog to gain a better understanding of what agricultural biodiversity is so important to world food supplies.
  4. Landscape Juice: With posts on gardening, landscape ecology, tree planting and other plant-focused matters, this blog is an excellent resource for those interested in botany or horticulture.
  5. Heavy Petal: Learn more about organic, urban gardening from this great blog.
  6. In the Herb Garden: The Herb Companion Magazine is home to this blog containing posts from a number of gardeners and herb lovers.
  7. Growing With Plants: If you want help deciding what to plant in your own backyard or just want to learn more about decorative plants, check out this blog.
  8. The Plant Hunter: Tim Wood travels the world in search of the coolest plants for home and garden and posts about them here.
  9. Love Plant Life Blog: This blog is focused on agriculture and growing food and while you can learn more about plants, you’re likely to learn more about policy.
  10. Plant Guides Blog: Need access to a helpful plant growing guide? This site is full of them.


what is remainder or modulus in python


Modulo is performed in the integer context, not fractional (remainders are integers). Therefore:
1 % 1  = 0  (1 times 1 plus 0)
1 % 2  = 1  (2 times 0 plus 1)
1 % 3  = 1  (3 times 0 plus 1)
6 % 3 = 0  (3 times 2 plus 0)
7 % 3 = 1  (3 times 2 plus 1)
8 % 3 = 2  (3 times 2 plus 2)

My motto is “learn by doing.”

I just learned this expression today

Please give credit where credit is due

Please give credit where credit is due. Plant biologists would rightfully be ridiculed if they claimed to have made new discoveries while equivalent phenomena were already known from animals or fungi. Given that the value of the world's agriculture is more than three times that of the entire pharmaceutical industry and that many more people die each year of hunger and malnutrition than from cancer, it is time that scientists of all stripes paid more attention to plant biology.


What can be done in R that can't be done with Python/Numpy/SciPy



regular expressions and R


Phylogenetics in R - a tutorial


visualzing/graphical tools/books

(1) protovis, D3.js - A graphical toolkit for visualization 

(2) Beautiful visualization

(3) Data flow

(4)Visualize this

(5) Beautiful data


Teach Yourself Programming in Ten Years - such a heavy title


Software Carpentry - Helping scientists make better software/programming


Version 4

We are currently updating the content and format so that students can work through the material they need, when they need it. The lectures we have completed so far cover:
  • Version Control – Learn how to collaborate with other people and automatically create a record of previous work using a version control system.
  • The Shell – Much of scientific computing involves the Unix operating system.  Effectively using the shell is one of the first steps to efficient Unix programming.
  • Python – A versatile open source language that is increasingly popular among scientific programmers.
  • Testing – The basics of software testing, including exception handling and unit testing.
  • Sets and Dictionaries – Using associative data structures to better represent data that isn’t a list or vector.
  • Regular Expressions – Manipulate text quickly with this powerful set of pattern matching tools.
  • Databases – An introduction to SQL, the most popular database query language.
  • Classes and Objects – The basics of object-oriented programming.
  • Program Design – An example driven introduction to effective program design.
  • Systems Programming – How to manipulate files and directories from a program.
  • Make – This tool will help automate everything from large software builds to batch processes.
  • Matrix Programming – Use array libraries to make numerical programs smaller and faster.
  • MATLAB – The world’s most popular numerical programming language.
  • Multimedia Programming – Work with images, sound, and other media.
  • Spreadsheets – Learn to use spreadsheets for data organization, analysis, and visualization.
  • Essays – Longer (non-video) discussion of some important ideas in scientific programming.
  • Recommended Reading – An annotated bibliography.
  • Glossary – Key terms.

Version 3

The lecture notes from Version 3 of this course (2004-2009) are available at http://software-carpentry.org/3_0/.

general advice to undergraduates who are interested in science


Here is a blog article, listed some advice to undergraduates.

It took me a while to respond for several reasons, not the least of which is that I'm super busy. And also because it felt kind of weird to answer. I mean, I didn't study neuroscience as an undergrad. And I didn't do well as an undergrad. So I certainly shouldn't be doling out advice!

(1) It's important to me that other people know how hard this life, science, and career stuff really is. People should know that often, success doesn't come easy.

(2) The second piece of non-specific advice: learn to network. Talk to other researchers. Email people about their work when you have questions. Don't be shy. Or rather, go ahead and be shy but recognize that lots of people are shy and the only way to learn from them is to overcome your mutual shyness. Plus, researchers love to know that someone read their work and are interested.

This advice isn't meant as a machiavellian ploy or anything. Networking lets you meet smart people, which gives you new ideas and new collaborations. This, in turn, lets you do science faster and better.

Networking is sharing, not manipulating.

(3) Third: learn how to do your own data analysis. Know statistics well. Know at least some basic programming/scripting in Python, R, Matlab, etc. This will be of immense value in helping you get your research done efficiently and correctly, without needing to rely on other people's code (and time and commitment). This will become more important as our field becomes more data driven.

The Place to Compare Genomes (CoGe) - really good tools


Multi-element figures in R

a good example:

Markov Chain - interesting and attractive


the name "Hillary" in US


Adaptation to Climate Across the Arabidopsis thaliana Genome



PGDD - plant genome duplication database


PGDD is a public database to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships. Current efforts focus on flowering plants with available whole genome sequences (preferrably assembled pseudomolecules with ordered gene models).

guide for genotype imputation tools - imput2


estimate or interpolate recombination rate for you


MareyMap is a meiotic recombination rate estimation program. It is based on R and features a graphical interfcace in tcl/tk. MareyMap comes with an extensive dataset and several interpolation methods. The user may also use MareyMap with his own data. A more detailed description of the capabilities of the program can be found in the section features .

from Genetic map to genetic distance among physical markers. Especially for Arabidopsis thaliana.


Special Issue: Collaborative Bioinformatics and RNA Analysis


put your running accessions under nohub (running in background)

(1) put your running accessions under nohub (running in background)

Using the Job Control of bash to send the process into the background:

> [crtl]+z
> bg

And as Sam/Jan mentioned you have to execute disown to avoid killing the process after you close the terminal.

disown -h 

(2) running your accessions under nohub directly

$ nohup yourcommand yourArguement &
$ nohup yourcommand yourArguement > standoutput &
$ nohup yourcommandinfile.sh &

The Plant Genome: An Evolutionary View on Structure and Function

This is a special issue of the plant journal:




chropainter and finestructure

1. ChromoPainter,

ChromoPainter is a tool for finding haplotypes in sequence data.  Each individual is "painted" as a combination of all other sequences. It can output a range of features, including:

  • Sample haplotypes
  • Expectations of the number of recombination events at all sites
  • A wide range of related features
It is useful to generate high quality Principal Components Analysis (PCA) from dense data, for creating data summaries for fineSTRUCTURE, for dating admixture events, and much more.

2. fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using dense sequencing data.  By using the output of ChromoPainter as a (nearly) sufficient summary statistic, it is able to perform model-based Bayesian clustering on large datasets, including full resequencing data, and can handle up to 1000s of individuals. Full assignment uncertainty is given.

population structure - admixture, clustering and PCA

(1) except structure, Useful software

(2) especially

mclust: Model-Based Clustering / Normal Mixture Modeling

(3) blogs about that


manuals for linux, R, bioconductor and next generation bioinformatics




The Patterns and Causes of Variation in Plant Nucleotide Substitution Rates



The Genetics of Hybrid Incompatibilities




good intro for beamer


Slopegraphs - a useful tools

1. for introduction of slopegraphs
2. for generating slopegraphs with R


outcrossing and selfing in plants - evolutionary and genomic consequence

  1. Boris Igic, Russell Lande, Joshua R. Kohn
    DOI: 10.1086/523362
    Stable URL: http://www.jstor.org/stable/10.1086/523362
  2. No Access
    Genomic Consequences of Outcrossing and Selfing in Plants(pp. 105-118)  
    Stephen I. Wright, Rob W. Ness, John Paul Foxe, Spencer C. H. Barrett
    DOI: 10.1086/523366
    Stable URL: http://www.jstor.org/stable/10.1086/523366


funnel plot

A funnel plot is a scatterplot of treatment effect against a measure of study size. It is used primarily as a visual aid to detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size.

A funnel plot is a useful graph designed to check the existence of publication bias in systematic reviews and meta-analyses. It assumes that the largest studies will be near the average, and small studies will be spread on both sides of the average. Variation from this assumption can indicate publication bias.




constrained elements in the human genome as revealed by mammalian alignments of 29 genomes


  1. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 12 Oct 2011 (doi: 10.1038/nature10530)
  2. Lin, M. F. et al. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 12 Oct 2011 (doi: 10.1101/gr.108753.110)


What do we need to know about speciation


Article Outline

Quantifying effects of environmental and geographical factors on patterns of genetic differentiation

The authors used a regression-based approach to simultaneously estimate the quantitative contributions of environmental adaptation and isolation by distance on genetic variation in Boechera stricta, a wild relative of Arabidopsis.



Dated origin of Arabidopsis thaliana - 13 Mya

This paper gave much more robust dating for origin of Arabidopsis thaliana:

Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana


We bring previously overlooked fossil evidence to bear on these questions and find the split between A. thaliana and Arabidopsis lyrata occurred about 13 Mya, and that Arabidopsis and the Brassica complex (broccoli, cabbage, canola) occurred about 43 Mya


正态检验的用途 - 似乎不大 - 要小心

1. Normality tests don't do what most think they do. Shapiro's test, Anderson Darling, and others are null hypothesis tests AGAINST the the assumption of normality. These should not be used to determine whether to use normal theory statistical procedures. In fact they are of virtually no value to the data analyst. Under what conditions are we interested in rejecting the null hypothesis that the data are normally distributed? I have never come across a situation where a normal test is the right thing to do. When the sample size is small, even big departures from normality are not detected, and when your sample size is large, even the smallest deviation from normality will lead to a rejected null.


2. I, personally, have never come across a situation where a normal test is the right thing to do. The problem is that when the sample size is small, even big departures from normality are not detected, and when your sample size is large, even the smallest deviation from normality will lead to a rejected null.


HyPhy - Hypothesis testing using phylogenies

1. HyPhy is a scriptable package that can fit statistical evolutionary models to alignment of homologous sequences using Maximum likelihood 2), estimate various parameters that have biological meaning, for example branch lengths, substitution rates, dN/dS ratios, recombination breakpoints, and test hypotheses about how sequences in the alignment have evolved. HyPhy focuses on inference about the evolutionary process. Even though it can do limited alignment and phylogenetic reconstruction, much better specialized programs exist for these purposes.
Here are some of the applications that HyPhy is often used for:
  • Positive and negative selection detection
  • Recombination analysis
  • Detecting co-evolving residues
  • Genomic and multiple-gene evolutionary inference
  • Molecular clock and relative rate tests
  • Nucleotide, protein and codon model selection
  • As a likelihood analysis engine for other software and web services
  • One-off analyses: tasks that no other package does out of the box and are not worth writing a specialized program for

2. Some of the most popular HyPhy functions (recombination, positive selection detection, etc) are implemented in a web-server hosted at http://www.datamonkey.org

Which codon sites are under diversifying positive or negative selection?
Three different codon-based maximum likelihood methods, SLAC, FEL and REL, can be used estimate the dN/dS (also known as Ka/Ks or ω) ratio at every codon in the alignment. An exhaustive discussion of each approach can be found in the methodology paper. All methods can also take recombination into account. This is done by screening the sequences for recombination breakpoints, identifying non-recombinant regions and allowing each to have its own phylogentic tree.
Is there evidence of selection in my alignment?
The PARRIS method, developed by Konrad Scheffler and colleagues, extends traditional codon-based likelihood ratio tests to detect if a proportion of sites in the alignment evolve with dN/dS>1. The method takes recombination and synonymous rate variation into account.
What is the evolutionary fingerprint of a gene?
The ESD method, described in a recent paper, fits a versatile general discrete bivariate model of site-by-site selective force variation to partition all sites into selective classes, and obtains an approximate posterior distribution of this partititoning. The resulting "noisy" distribution of selective regimes is the evolutionary fingerprint of a gene. The EVF (evolutionary fingerprinting) module implements this procedure, and can also infer which individual sites appear to be positively selected while accounting for parameter estimation error (analogous to the BEB methodology of the PAML package).
Which codon sites are under positive or negative selection at the population level?
The codon-based maximum likelihood IFEL method can investigate whether sequences sampled from a population (e.g. viral sequences from different hosts) have been subject to selective pressure at the population level (i.e. along internal branches). A discussion of the method and its application can be found here
Did selective pressure vary along lineages, i.e. over time?
The codon-based genetic algorithm GABranch method can automatically partition all branches of the phylogeny describing non-recombinant data into groups according to dN/dS. Robust multi-model inference is used to collate results from all models examined during the run to provide confidence intervals on dN/dS for each branch and guard against model misspecification and overfitting (method details).
How about episodic diversifying selection (branch-site methods)? Using the modeling framework, which allows the efficient estimations with models which permit dN/dS variation along both sites and lineages, Datamonkey implements two tests geared towards finding lineages and sites subject to episodic diversifying selection (EDS).
The Branch-site REL method, identifies those branches where a proportion of sites evolves under EDS. If you are primarily interested in finding which lineages (but don't care about which sites) have experienced EDS, use this method. Alternatively, if you are interested in sites (but don't care about which lineages) subject to EDS, then the MEME method is appropriate.
What about different types of selection?
Protein sequences can be screened for evidence of directional using the DEPS method, described here, useful when one wants to detect convergent evolution or selective sweeps. For coding sequences, the TOGGLE model, developed by Wayne Delport and colleagues, can detect selection-driven changes that result in amino-acid toggling. A canonical example of this can be found in immune-driven evolution of HIV-1 (escape and reversion).
Which evolutionary model should I use for my data?
For each type of data, nucleotide, amino-acid and codon, Datamonkey implements separate model selection procedures. An exhaustive search is performed for all possible (Markov, time-reversible) models of nucleotide evolution. For protein data, a collection of published empirical models are fitted to the alignment and the best one is selected using AICc. Finally, for coding data, a sophisticated genetic-algorithm procedure described in our recent paper is used to examine thousands of potential models and report the best one and various metrics based on the set of credible models - this feature is implemented in the CMS module.
Did any sites co-evolve?
A Bayesian graphical model is deduced from reconstructed substitutions at each branch/site combination to infer conditional evolutionary dependancies of sites in the alignments, i.e. whether a site is more or less likely to experience a non-synonymous substitution at a branch when certain other sites do (or do not) experience non-synonymous substitutions at the same branch. The SPIDERMONKEY method was introduced in the evolutionary context in our paper on the evolution of the phenotypically important and highly variable V3 loop of the envelope glycoprotein in HIV-1.
Has recombination acted upon sequences in an alignment?
Recombination leaves an imprint on sequence alignments: different segments of the alignment may be described by different phylogenetic trees, called phylogenetic discordance. Datamonkey.org implementes two methods: SBP, suitable for answering the question "Is there evidence of recombination in the alignment?", and GARD, that attempts to find all the recombination breakpoints. Both method are described in this paper. The output of GARD is accepted by most other analyses, and because recombination can mislead phylogenetic analysis that do not account for it, we strongly urge that recombination testing be done on any alignment that is going to be analyzed for positive selection.You can also submit a collection of HIV-1 sequences for recombination screening by a specialized recombination detection algorithm SCUEAL described in this paper.
What were the ancestral sequences?
The ASR module implements three different approaches to reconstructing ancestral sequences: joint, marginal and sampled - see this paper for a description and original methodology attribution, from simple or partitioned alignments.
3. One functionality from HyPhy:

A random effects branch-site model for detecting episodic diversifying selection