evolving all we are: unix commands for bioinformatician

2012年3月9日星期五

unix commands for bioinformatician

it is cited from here.

Xargs

Xargs is one of my most frequently used UNIX commands. Unfortunately, many UNIX users overlook how powerful it is.

delete all *.txt files in a directory:

find . -name "*.txt" | xargs rm

package all *.pl files in a directory:

find . -name "*.pl" | xargs tar -zcf pl.tar.gz

kill all processes that match "something":

ps -u `whoami` | awk '/something/{print $1}' | xargs kill

rename all *.txt as *.bak:

find . -name "*.txt" | sed "s/\.txt$//" | xargs -i echo mv {}.txt {}.bak | sh

run the same command for 100 times (in case of bootstraing, for example):

perl -e 'print "$_\n"for(1..100)' | xargs -i bsub echo bsub -o {}.out -e {}.err some_cmd | sh

submit all commands in a command file (one command per line):

cat my_cmds.sh | xargs -i echo bsub {} | sh

The last three examples only work with GNU xargs. BSD xargs does not accept '-i'.

Find

In a directory, find command finds all the files that meet certain criteria. You can write very complex rules at the command line, but I think the following examples are most useful:

find all files with extension as "*.txt" (files can exist in subdirectory):

find . -name "*.txt"

find all directories:

find . -type d

Awk is a programming language that is specifically designed for quickly manipulating space delimited data. Although you can achieve all its functionality with Perl, awk is simpler in many practical cases. You can find a lot of online tutorials, but here I will only show a few examples which cover most of my daily uses of awk.

choose rows where column 3 is larger than column 5:

awk '$3>$5' input.txt > output.txt

extract column 2,4,5:

awk '{print $2,$4,$5}' input.txt > output.txt
awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt

show rows between 20th and 80th:

awk 'NR>=20&&NR<=80' input.txt > output.txt

calculate the average of column 2:

awk '{x+=$2}END{print x/NR}' input.txt

regex (egrep):

awk '/^test[0-9]+/' input.txt

calculate the sum of column 2 and 3 and put it at the end of a row or replace the first column:

awk '{print $0,$2+$3}' input.txt
awk '{$1=$2+$3;print}' input.txt

join two files on column 1:

awk 'BEGIN{while((getline<"file1.txt")>0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt > output.txt

count number of occurrence of column 2 (uniq -c):

awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt

apply "uniq" on column 2, only printing the first occurrence (uniq):

awk '!($2 in l){print;l[$2]=1}' input.txt

count different words (wc):

awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt

deal with simple CSV:

awk -F, '{print $1,$2}'

substitution (sed is simpler in this case):

awk '{sub(/test/, "no", $0);print}' input.txt

Cut

Cut cuts specified columns. The default delimiter is a single TAB.

cut the 1st, 2nd, 3rd, 5th, 7th and following columns:

cut -f1-3,5,7- input.txt

cut the 3rd column with columns separated by a single space:

cut -d" " -f 3 input.txt

Note that awk, like Perl's split, takes continuous blank characters as the delimiter, but cut only takes a single character as the delimiter.

Sort

Almost all the scripting languages have built-in sort, but none of them are so flexible as sort command. In addition, GNU sort is also space efficient. I used to sort a 20Gb file with less than 2Gb memory. It is not trivial to implement so powerful a sort by yourself.

sort a space-delimited file based on its first column, then the second if the first is the same, and so on:

sort input.txt

sort a huge file (GNU sort ONLY):

sort -S 1500M -t $HOME/tmp input.txt > sorted.txt

sort starting from the third column, skipping the first two columns:

sort +2 input.txt

sort the second column as numbers, descending order; if identical, sort the 3rd as strings, ascending order:

sort -k2,2nr -k3,3 input.txt

sort starting from the 4th character at column 2, as numbers:

sort -k2.4n input.txt

Other Tips

use brackets:

(echo hello; echo world; cat foo.txt) > output.txt
(cd foo; ls bar.txt)

save stderr output to a file:

some_cmd 2> output.err

direct stderr output to stdout:

some_cmd 2>&1 | more
some_cmd >output.outerr 2>&1

view a text file using 4 as a TAB size and without line wrapping:

less -S -x4 text.txt

[sed] substitute 'foo(\d+)' as "(\d+)bar":

sed "s/foo$[0-9]*$/\1bar/g"
perl -pe 's/foo(\d+)/$1bar/g"

[uniq] count the occurrence of different strings at column 2:

cut -f2 input.txt | uniq -c

grep "--enable" in a file (use "--" to prevent grep from parsing command options):

grep -- "--enable" input.txt

没有评论:

发表评论

订阅：博文评论 (Atom)

我的博客列表

FlowingData

Marrying a ChatGPT-driven virtual partner - A woman in Japan “married” her virtual partner in a real-world ceremony. Such… *Tags:* ChatGPT, relationships, Reuters
2 小时前
Discover Blogs

Do Animals Hold Funerals for Their Loved Ones Like We Do, or Are We Just Projecting Grief? - Learn more about the death rituals seen throughout the animal kingdom and whether or not animals grieve the same way humans do.
1 天前
Anole Annals

Miami Knights: Urbanization Facilitates an Invasive Anole - The Cuban knight anole (Anolis equestris) was intentionally introduced to South Florida in the 1950s. Since then, they have spread along the Atlantic Coast...
1 周前
What You're Doing Is Rather Desperate

Brief thoughts on: iNaturalist - Let me say first: I was wrong. My profile shows that I created my account at iNaturalist in August 2008, the year that the site was launched. I created a l...
2 周前
OUseful.Info, the blog...

If you must… - I admit to using genAI for coding but I’m still reluctant to engage with in pretty much any other context except a bit of OCR (particularly with not-quite-...
2 周前
Phylogenetic Tools for Comparative Biology

Uncertain input data for hidden-rates model in fitHRM - A couple of days ago a *phytools* user contacted me reporting that he was unable to incorporate tip uncertainty in the standard way when fitting a hidde...
3 周前
The Tree of Life

A ton to be thankful for -- here is one part of that - all the acknowledgement sections from my scholarly papers - So - it is another Thanksgiving Day and in addition to thinking about family, and football, and Alice's Restaurant, I also think a lot about all the peop...
3 周前
Nothing in biology makes sense!

Łysienie u kobiet – rodzaje, przyczyny i skuteczne metody hamowania - Utrata włosów jest dla kobiety czymś więcej niż tylko problemem estetycznym. To moment, w którym spojrzenie w lustro zaczyna budzić niepewność, a codzien...
3 周前
iPhylo

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs - Some quick notes based on experiments with Model Context Protocol (MCP) and (Claude](https://claude.ai). Model Context Protocol (MCP) is all the rage rig...
4 周前
Public Rambling

AI "peer" review - the impact on scientific publishing - It is the first time, in the second half of this year, that I am not trying to urgently deal with something. So, instead of working on some manuscripts f...
4 周前
Jabberwocky Ecology

DeepForest 2.0! - It is with great excitement that we release DeepForest 2.0! This is the first major release since the project took shape in its current form. The codebase ...
1 个月前
Shige's Research Blog

Agents.jl vs. Vanaha.jl - Some very informative discussions here.
1 个月前
Getting Genetics Done

Repost: Construct objects with idiomatic R code - *Reposted from the original at https://blog.stephenturner.us/p/construct-objects-with-idiomatic-r-code* --- Today I discovered the constructive package ...
2 个月前
The EEB & flow

Are evidence-based policies and pro-science governmental agendas dependent on happiness? - There are a lot of problems in the world—from the effects of climate change, emerging zoonotic diseases, strife and war, and more. But we also need to r...
3 个月前
Living in an Ivory Basement

Announcing taxburst, an update of the Krona software for taxonomy exploration - Announcing taxburst for metagenome taxonomy!
4 个月前
PubMed New and Noteworthy

Recent Changes to References, Search Tools, Related Citations, and Sharing Features - As part of our commitment to continuous improvement, PubMed recently introduced several updates designed to improve usability and consistency across the...
5 个月前
Python for Bioinformatics

Python to draw geometry - Over the last couple of weeks I've been working on a library of geometry utilities in Python, code to work with matplotlib to draw figures for my geometr...
6 个月前
The Molecular Ecologist

2025 Molecular Ecology Prize goes to Rosemary Gillespie, for harnessing molecular phylogenetics to understand community assembly and ecology - The Molecular Ecology Prize Committee has announced the 2025 recipient of the award, which recognizes an outstanding scientist who has made significant con...
7 个月前
One Tip Per Day

Notes on DEseq2 design - My note from today's lab meeting: 1. For senario like Rambo's project, where in a case-control two group comparison, each subject has multiple repe...
7 个月前
Econometric Sense

Econometrics and Psychometrics: Rivers Out of Biometry. - *Abstract* The period 1895–1925 saw the origins and establishment of the fields that came to be called econometrics and psychometrics. I consider what th...
8 个月前
theBioBucket*

Trailism: Python Script for Batch Analyzing GPX Files & Reporting Statistics for Curves & Stretches - Run this script from a folder with GPX files and get a markdown formatted report with statistics for curves and stretches! import math import os import g...
9 个月前
Quantum Forest

The Ghost of p-values Past - — …but it doesn’t make any sense!— Please hear me out.— You have one minute, nothing more.— So our hero is a scientist/statistician who doesn’t like Christ...
11 个月前
Abhishek Tiwari

Input vs Output Privacy - Privacy in data systems has traditionally focused on protecting sensitive information as it enters a system - what we call input privacy. However, as syste...
1 年前
Statistics, R, Graphics and Fun

Bye, RStudio/Posit! - Who is down? Me. After more than 10 years at RStudio/Posit, the time has come for me to explore other opportunities. A little over two weeks ago, I was t...
1 年前
theoretical ecology

Time to leave “species richness” behind, as it serves no scientific purpose on x- or y-axis - A guest post by Carsten F. Dormann, University of Freiburg In “biodiversity science”, species richness takes a very prominent role. It is the most analysed...
2 年前
The OpenHelix Blog

Dynamics 365 Finance moderniza las operaciones financieras de tu empresa - Dynamics 365 Finance es un sistema ERP basado en la nube diseñado para optimizar los procesos comerciales. Al ofrecer una plataforma para todas las funci...
2 年前
Junk Charts

Losing the plot while stacking up the bars - Kaiser looks at an infographic about dirty American cities.
2 年前
microBEnet: The microbiology of the Built Environment network.

Join Indoor Air 2024’s Conference Countdown Challenge - (posting on behalf of Dr. Karen Dannemiller) The Indoor Air 2024 conference from the International Society for Indoor Air Quality and Climate (ISAIQ) wil...
2 年前
crazyhottommy's blog

10 tips for learning git - 1/ Several basic commands will serve you a long way: git clone git add git commit -m git push Those are enough to get you started. To be honest, those a...
2 年前
Cartesian Faith

Wallets as a reward and penalty mechanism for online behavior - I’m creating an AI moderator for social media using reinforcement learning (RL). My goal is to reduce toxicity/outrage and increase … Continue reading →
3 年前
The Geek Stuff

PaloAlto init-cfg.txt Bootstrap Config file Layout with Examples - When you install and configure the PaloAlto firewall, when the firewall boots up for the first time, it does the bootstrapping process. PaloAlto uses the s...
3 年前
Seqonomics

Favorite books 2021 - I've been reading mostly fiction this year. Rather than go over my favorite books, this year year I'm recommending three authors whose books I find consi...
3 年前
Archaeological Network Analysis

GIS and networks for archaeology: spring school for PhD students - There’s a cool Brussels spring school coming up on GIS and networks for archaeology. I’m giving the networks keynote, yay 🙂 The program looks very detaile...
4 年前
Struggling Through Problems

Can we have anti-lambdas please? - Aren’t you tired of flattening your code to avoid repeated computation: val x = longComputation() ls map { y => (x, y)} when you’d rather write ls map...
4 年前
Genomes Unzipped

Best Genealogy Software - Researching your family history is a fascinating experience. Even if you find out that your ancestors led very normal lives. Just being able to see the nam...
5 年前
MYRMECOS

The 9th Annual Holiday Print Sale - I am excited to announce the return of the Holiday Insect Print Sale, now in its 9th year! I have selected 35-ish photographs that will be available at 70%...
5 年前
PLoS Blogs Network

PLOS Stands with the National Academies on Continuing WHO Funding - Yesterday, the presidents of the National Academies of Sciences, Engineering, and Medicine released a statement warning that interrupting funding for Wor...
5 年前
My Biased Coin

Current CS 124 Stats - This is as much personal recording for me (and perhaps of interest to Harvard people who read the blog). But also putting the numbers here for others to k...
5 年前
R, Ruby, and Finance

How to rapidly improve your management skills - It can be overwhelming when you start as a new manager, or when you're an existing manager who is asked to take on more responsibility. The success crite...
5 年前
Dienekes’ Anthropology Blog

Merry Christmas -
5 年前
Darren Wilkinson's research blog

Unbiased MCMC with couplings - Yesterday there was an RSS Read Paper meeting for the paper Unbiased Markov chain Monte Carlo with couplings by Pierre Jacob, John O’Leary and Yves F. Atch...
6 年前
the mind wobbles

SWAT4(HC)LS 2019: Morning Presentations, Day 2 - All talk and poster papers are available for download at http://purl.org/ronco/swat The organizers would like to have everyone’s suggestions for future con...
6 年前
bayesianbiologist

Eigenvectors from Eigenvalues – a NumPy implementation - I was intrigued by the recent splashy result showing how eigenvectors can be computed from eigenvalues alone. The finding was covered in Quanta magazine an...
6 年前
Yixf's blog

蝉 - 蝉聒噪炎日下，噤声风露中。居高空自好，草底蚁拖虫。
6 年前
HubLog

Converting PDF to PNG or JPEG - What are the best tools or services for converting a page of a PDF file to an image? The first thing that usually shows up for converting PDF to PNG or J...
6 年前
The Genome Factory

25 reasons assemblies don't make it into Refseq - Introduction When you submit a genome assembly, or NCBI assembles the reads you submitted, it ends up in Genbank. If the assembly is of sufficient qualit...
6 年前
ENMTools

Now on biorXiv: Evaluating species distribution models with discrimination accuracy is uninformative for many applications - I've just recently posted a new paper to biorXiv entitled "Evaluating species distribution models with discrimination accuracy is uninformative for many ap...
6 年前
Biospherica

Phased antenna array design using Hamiltonian Monte Carlo - Phased antenna arrays allow a radio or microwave beam to be steered electronically. They are of increasing practical importance. Array optimisation problem...
6 年前
My Weblog on Bioinformatics, Genome Science, Next Generation Sequencing

FAAH-OUT This woman feels no pain! - *"A woman in Scotland can feel virtually no pain due to a mutation ...At age 65, the woman sought treatment for an issue with her hip, which turned out t...
6 年前
Linux Commando

ts: epitome of the Unix philosophy - Do one thing and do it well - the Unix philosophy In this new age of Linux bloatware (hello, *systemd*), it is exhilarating to discover small gems like ...
6 年前
Fabio Marroni’s Blog

Separatori decimali e migliaia in excel, libreoffice, e R - Accade spesso di ricevere o spedire una tabella excel (o di testo, che comunque viene poi aperta con excel) da un collega che ha impostazioni dei separator...
6 年前
派的无理性

饭碗与文明 - 饭碗与文明家里有一个裂了纹的小瓷碗。几次用它盛了热饭，总是担心漏出热水，烫了手，或者随时啪嚓一下四分五裂，伤了手脚。还好，这些情况都没有发生。扔掉算了，但好歹还能用，放在一边好了。这样的情形，应该用敝帚自珍来形容。一支瓷碗，在现在是不足为奇的，可却是我们祖先的伟大发明，要不为何china就是瓷器，...
7 年前
geolabs

Coffee Bar Density in Nuremberg - Fun project coffee heatmap based on OSM data…. Friends of mine are planning to open a coffee bar in Nürnberg city centre. Where is a blank spot?
7 年前
Keep on Fighting!

优秀到无法被忽视 - 看了一篇《优秀到无法被忽视》的总结，有几个观点有所共鸣： 1. 激情是一种稀缺的特质，尤其是职业激情。在《穷查理宝典》读书笔记中谈芒格的第六个人生道理时，我曾说多数人都缺乏对某件事的强烈兴趣。在对秦旭的采访中，我也曾说有稳定热情的人相对少见。激情已经够少见，职业激情就更少见了。...
7 年前
JAGS News

Off to Warwick - I apologise for the low frequency of updates to this blog. The big news is that I am moving back to the UK next month after 23 years in France working for ...
7 年前
Sharp Statistics

EasyER Version 1.2 - Version 1.2 of EasyER has now been released. The main change is the addition of a chart builder interface for interactively creating charts. The chart bu...
7 年前
Mailund on the Internet

Lecture notes on Computational Thinking - I am working on some lecture notes that should become a book at some point. It is for a class on “Computational Thinking”, which I guess is just a fancy te...
7 年前
Vijay Barve

Darwinazing biodiversity data in R - “Darwin Core (DwC) is a standard maintained by the Darwin Core maintenance group. It includes a glossary of terms (in other contexts these might be called ...
7 年前
Next-Gen Sequencing

Genome Coverage from BAM file - There are many excellent tools for analysis of Next Gen Sequencing data in the standard BAM alignment format so I was surprised how difficult it was for me...
7 年前
The MolBio Hut

A “bottoms-up” approach to scientific publishing - A few days ago, a piece was posted at ASAPbio which reminded me of an idea I had a long time ago about a new model for publishing and peer review. I descri...
7 年前
Sam Harris: Author, neuroscientist, philosopher.

#113 — Consciousness and the Self - In this episode of the Waking Up podcast, Sam Harris speaks with Anil Seth about the scientific study of consciousness, where consciousness emerges in natu...
7 年前
i'm a chordata! urochordata!

Bayesian SEM with BRMS - Note: for the most up to date version of this, see this post on rpubs Background So, it looks like brms version 2.0 implements multivariate responses – and...
8 年前
The Spittoon

23andMe Adds New Genetic Health Risk Reports - 23andMe released two new genetic health risk reports for customers for Age-Related Macular Degeneration and Hereditary Hemochromatosis. The post 23andMe ...
8 年前
The UNIX School

Perl - Connect to Oracle database and SELECT - How to connect to a database from a Perl program? Let us see in this article how to connect to Oracle and read from a table. As a pre-requisite, we need ...
8 年前
Research tips

Monash Rmarkdown templates on github - My Rmarkdown templates for staff and students in my department are now available as github repositories. MonashThesis: For PhD theses. MonashBeamer: For be...
8 年前
Portfolio Probe

US market portrait 2016 final - US large cap market returns. Fine print The data are from Yahoo The S&P 500 stocks are used (as implied by Wikipedia on 2016 January 16) that still survive...
8 年前
The Praise of Insects

Cook Islands Insect Expedition: Funding Granted! - Satellite image of Rarotonga, Cook Islands. Image via Wikimedia Commons. Licence: Public Domain. As the tagline of this blog suggests, I have a profound ...
9 年前
CoreGenomics

Unintended consequences of NGS-base NIPT? - The UK recently approved an NIPT test to screen high risk pregnancies for foetal trisomy 21, 13, or 18 after the current primary screening test, and in pl...
9 年前
librestats

Comparing Symmetric Eigenvalue Performance - Lazy reader’s guide: skip to the pretty pictures, skim the conclusions section, ignore the rest. Background I think a lot about eigenvalue and singular ...
9 年前
YOKOFAKUN

Hello WDL ( Workflow Description Language ) - This is a quick note about my first WDL workflow (Workflow Description Language) https://software.broadinstitute.org/wdl/. As a Makefile, my workflow would...
9 年前
Ewan's Blog; bioinformatician at large

Sharing clinical data: everyone wins - Patients who contribute their data to research are primarily motivated by a desire to help others with the same plight, through the development of better ...
9 年前
Timely Portfolio

Ooms Magical Polyglot World - *crossposted from BuildingWidgets* ------------------------------ Jeroen Ooms (@opencpu) provides R users a magical polyglot world of R, JavaScript, C, and...
9 年前
数据科学与R语言

TensorFlow初体验 - 以前玩深度学习一直是用的theano和keras，做为谷粉不能不试一下大热的TensorFlow。首先安装起来。 TensorFlow的安装指南非常详细，我是python的anaconda环境，所以直接先创建一个新环境，创建前先更新一下conda conda update conda conda update ...
9 年前
Haldane's Sieve

Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit - Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit David S. Lawrie bioRxiv doi: http://dx.doi.org/10.1101/042622 Forward Wright...
9 年前
bioCS

Avoiding unnecessary memory allocations in R - As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes: The reason why the C++ function is faster is s...
9 年前
me nugget

Visualizing model predictions in 3d - Here is a brief exploration of the misc3d package, which has some nice functions that can be used in conjunction with rgl. I am especially pleased with the...
9 年前
Bigcomputing

A Machine Learning Example in R for Continous Outcomes using Cubist - Cubist is a machine learning algorithm for continous outcomes. Cubist is a rule-based decision tree that automatically deals with missing values. This mak...
9 年前
EvoPhylo

Darwin the dropout - It annoys me slightly that almost all the pictures of Darwin circulating around on #DarwinDay are of some grey-haired old guy. Darwin made many of his impo...
9 年前
Genome Toolbox

Normal Distribution Functions in R - I always need to look up how to use the distributional functions in R. Rather than it always being a guessing game I made a quick primer with visual exampl...
9 年前
Gregor Gorjanc

Remove Spotify App Badges on Mac - See here
10 年前
Oscillatory Thoughts

New Voytek Lab paper by Richard Gao! - Voytek Lab Cognitive Science PhD student, Richard Gao, wrote a detailed piece about his new paper, "Interpreting the Electrophysiological Power Spectrum,” ...
10 年前
IDV User Experience

Beautiful Old Map - Etsy is a bittersweet treasure trove of vintage maps. I have mixed feelings because, on one hand, sellers disassemble awesome old cartographic volumes, yan...
10 年前
Stochastic Nonsense

Splitting Audio with ffmpeg - Here’s a quick utility to use a set list and ffmpeg to split single audio files into multiple tracks. It splits audio files via a setlist then sets the s...
10 年前
Graph of the Week

Pluto: To Catch an Icy King - Sly as a fox, it is. Mysterious and diminutive, it has eluded us for decades. Despite what we've learned about Pluto, constant debate continues to rage ov...
10 年前
Jeffrey Horner

This is one of my favorite ggplot2 plots I’ve ever made, but it... - This is one of my favorite ggplot2 plots I’ve ever made, but it makes me sad. Can you deduce what this plot conveys? Explain the sporadically dashed colo...
10 年前
Omnia sunt Communia!

Sistemas Fotovoltaicos en Redes de Distribución - Hace un par de meses conseguí una plaza de Profesor Contratado Doctor en la Escuela Técnica Superior de Ingeniería y Diseño … Sigue leyendo →
10 年前
Tr8dr

Market-Making Portfolio & Hedging - With market making we can try to be neutral by skewing prices in such a way as to maintain a neutral position. To the extent that the market can become 1...
10 年前
We think therefore we R

The curious case of ARIMA modelling using R - I recently made an interesting observation that I thought is worth sharing. During a data expedition process while trying to fit an ARIMA (auto-regressi...
10 年前
Biochemistry, Molecular, Cellular & Developmental Biology

Night Vision Without the Goggles - The human eye consists of focusing elements and photoreceptors. Images are formed when light is projected onto the retina, which houses photoreceptors call...
10 年前
Adventures in Statistical Computing

A Bet Over Beer (or how to write a faster root() function in IML in SAS 9.4) - I haven't been here in a while -- previous job did not allow blogging. Over beers the other night with some friends and alumni from SAS, someone mention...
10 年前
Systematic Investor

New Home of Systematic Investor Blog - Please visit the New Home of Systematic Investor Blog at SystematicInvestor.GitHub.io
10 年前
Left Censored

R and MPI on Ohio Supercomputer Center’s Oakley cluster - A few years ago, I wrote a short guide to Using R and snow on the Ohio Supercomputer Center’s Glenn cluster. Several things have changed in the world of R ...
10 年前
Statistic on aiR

Adjustment for Multiple Comparison Tests with R: Resources on the web - *1. Bonferroni correction* p.adjust(p, method = "bonferroni") Read: http://en.wikipedia.org/wiki/ *2. Sidak (Dunn-Sidak) correction* Read: http://en.wiki...
10 年前
Carnival of Evolution

Carnival of Evolution #78 - the Short-Arse edition - Finally, the 78rd edition is up. Not too much has happened in the evolution blogosphere to my knowledge, so this edition is really short but good. Like sho...
10 年前
MycorWeb Fungal Genomics

1000 Fungal Genome (1KFG) project: Graduate Student-Postdoc Challenge (2014) - The 1000 Fungal Genome (1KFG) project is a large-scale community sequencing project supported by the Joint Genome Institute (JGI). The goal of 1KFG is to ...
11 年前
Matt's Stats n stuff

Shading between two lines – ggplot - First one to say geom_ribbon loses. I was plotting some data for a colleague, had two lines (repeated experiment) per person (time on the x axis) facetted ...
11 年前
Quantitative Ecology

Sample uniformly within a fixed radius. - I was asked how to do this today and thought that I would share the answer: *## Sample points uniformly within a fixed radius* nrand*=*1000 maxstep*=*10 *...
11 年前
Bioinformatics Tutorials

Comments on “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines” - I recently co-authored a paper in Genome Biology entitled “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug ...
11 年前
The Ubuntu R Blog

c2d4u and RRutter Ubuntu PPA Updates - On January 27, 2014 Raring Ringtail (13.04) reached end of life. I have been slowly removing the Raring repositories from the RRutter PPA, c2d4u PPA, and...
11 年前
ExploringDataBlog

A question of model uncertainty - It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time. The first is th...
11 年前
PremierSoccerStats

Craig Bellamy – quite dplyr - This weekend brought a couple of firsts in Cardiff’s winner against Norwich After a wretched time at Manchester United, Wilfried Zaha recorded his first Pr...
11 年前
The Atavism

A Very Special edition of Sunday Spinelessness - Wow, it's been a long time since I wrote something here. Let's see if I can remember how this goes: I find a weird-looking bug and take some photos. Then ...
11 年前
anrprogrammer

Database Reflection using dplyr - At work I write a ton of SQL, and I do most of my querying using R. The workflow goes: Create a string with the SQL in R Plug the string into fetchQuery (...
11 年前
Jack Zhu's Bioinfo

userstyles.org - Dictionary.com - permanent IPA pronounciation - Dictionary.com - permanent IPA pronounciation - Themes and Skins for Reference - userstyles.org
12 年前
Sacha Epskamp

A general Shiny app for reading data - A general Shiny app for reading data A general Shiny app for reading data I have been playing around a lot with making web-applications using shiny. What...
12 年前
mintgene

Reconstructing Principal Component Analysis Matrix - Edit: I’ve updated afterPCA function to work with input of any dimensions. Previously it only generated correct output for square matrices. PCA is widely u...
12 年前
Dodecad Ancestry Project

D-statistics on ADMIXTURE components - I have implemented the method of D-statistics as an R function. This will allow you to take your raw genotype data and calculate various D-statistics of th...
13 年前
Simply Statistics

We've Moved! - Simply Statistics has moved to a new platform and so if you’ve been following our blog on Tumblr, you’ll have to update your links/RSS feeds to the new w...
13 年前
Treevolution: Biology through the evolutionary lens

A genetic cartography of humans - The Phase I paper of the 1000 genomes project has been published in Nature. Similarly to the completion of the first draft of the human genome sequence,...
13 年前
nzprimarysectortrade

Have I chosen the right power company? - Do you always wonder if I have chosen the right power company and have not been over charged? Your questions may be answered here (if you reside in Welling...
13 年前
genomeboy

Closing time - A note from PLOS Blogs Community Manager Victoria Costello: We’re sad to see Misha Angrist leave our blog roster but we’ve been honored to host the last tw...
13 年前
Quantitative Ecology

Blog Moving - Thanks to everyone who's visited this blog and provided encouragement and suggestions. My blog is moving to http://danieljhocking.wordpress.com/quantitativ...
13 年前
EvolvingSpaces

Sharing Data in Archaeology - ...I know it has been a long time since my last post and many, many things happened. Our CAA session on Uncertainty was great (many thanks to the contribu...
13 年前
Coffee and Econometrics in the Morning

Using apply() to create a unique id - Suppose you have a data set with two identifiers. For example, maybe you're studying the relationships among firms in an industry and you have a way to lin...
13 年前
Blog | Wainwright Lab | University of California, Davis

Is your phylogeny informative? - (crossposted from my lab notebook) Yesterday my paper [cite]10.1111/j.1558-5646.2012.01574.x[/cite] appeared in early view in Evolution,As the open access ...
13 年前
Talk To Myself

多重PCR引物设计 (2) ——平衡反应体系 - 文 / 屈武斌 (quwubin@gmail.com) 多重PCR的一个关键点在于反应体系的平衡。即使多重PCR体系内每对引物的特异性都很好，在单重PCR测试中扩增效果也非常好，但到了多重体系中却经常出现某些目标产物扩增不出来的现象。原因就在于反应体系的不平衡，这种不平衡导致在前期的几轮反应中某些优势引物...
13 年前
Consistently Infrequent

R: A Quick Scrape of Top Grossing Films from boxofficemojo.com - Introduction I was looking at a list of the top grossing films of all time (available from boxofficemojo.com) and was wondering what kind of graphs I wou...
13 年前
Recology

Recology has moved - Recology has moved to Github, using Jekyll. I just finished moving the Recology blog content, etc. to Github. This move is intended to make it easy to do...
13 年前
BMB's commonplace

Google Scholar (still) sucks - (This is a follow-up to my previous post on the topic.) I was encouraged by the appearance of *two* R-based Scholar-scrapers, within a week of each other. ...
14 年前
The Plant Informatics

Rod Page’s VIZBI 2011 annotated links - iPhylo is hands-down my favorite blog, whose author is Rod Page. In the blog post “Some VIZBI 2011 links”, Rod grabbed a list of visualization softwa...
14 年前
Modern Toolmaking

25+ more ways to bring data into R - The rdatamarket post on the Revolutions blog and this post on Decision Stats reminded me about my list of Data APIs/feeds available as packages in R on Cro...
14 年前
dechronization

- Applications for our r-workshop are still being accepted - please apply by June 15! We are pleased to announce an intensive short course on using R to per...
14 年前
joint posterior

compiler and runiregGibbs (bayesm) - So everyone's excited about the new R 2.13 release because of the compiler package. Apparently it is easy to get a 3x speed increase by simply compiling a ...
14 年前
The Prince of Slides

Introduction to The Prince of Slides - In my everlasting search for the answer to every sports (among other topics) question imaginable, I decided to create this blog. While there are a number o...
16 年前
Back Side Smack

-
NYGC Blog | New York Genome Center

-
猴子的博客

-
Biology-blog.com: The place for biology

-
Carl Boettiger

-
Edwin Chen's Blog

-
Data Jujitsu

-
Suprageography

-
O|B|F News

-
alexfarquhar's posterous

-
Bio and Geo Informatics

-
PolStat Blog

-
Sean Davis - Recent Blogging

-
Zero Intelligence Agents

-
All Things Tagxedo

-
Jason Bryer

-
From the bottom of the heap

-
AnthroGenetics' Blog

-
FinchTalk

-
Tore Opsahl

-
Programming for Scientists

-
格物堂

-
Hottest Questions - BioStar

-
Jinlong Zhang's Homepage - Main

-
怡然轩 | Yixuan's Blog

-
Pathogens: Genes and Genomes

-
Genome Engineering | Site Wide Activity

-