evolving all we are: Useful Bash commands to handle FASTA files

2012年2月25日星期六

Useful Bash commands to handle FASTA files

http://biostar.stackexchange.com/questions/17726/useful-bash-commands-to-handle-fasta-files

http://chrisduran.eu/bioinformatics/linux-and-osx-commands-for-working-with-fasta-files/

#####################################################

(1) counting number of sequences in a fasta file:

grep -c "^>" file.fa

remove comments

sed -e 's/^\(>[^[:space:]]*\).*/\1/' my.fasta > mymodified.fasta

(2) add something to end of all header lines:

sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa

(3) clean up a fasta file so only first column of the header is outputted:

awk '{print $1}' file.fa > output.fa

(4) To extract ids, just use the following:


grep -o -E "^>\w+" file.fasta | tr -d ">"


(5) A useful step is to linearize your sequences (i.e. remove the sequence wrapping). This is not a perfect solution, as I suspect that a few steps could be avoided, but it works quite fast, even for thousands of sequences.
sed -e 's/\(^>.*$\)/#\1#/' file.fasta | tr -d "\r" | tr -d "\n" | sed -e 's/$/#/' | tr "#" "\n" | sed -e '/^$/d'


(6) Remove duplicated sequences. Pierre Lindenbaum proposed this solution.
sed -e '/^>/s/$/@/' -e 's/^>/#/' file.fasta | tr -d '\n' | tr "#" "\n" | tr "@" "\t" | sort -u -t $'\t' -f -k 2,2  | sed -e 's/^/>/' -e 's/\t/\n/'






(7) Splitting a FASTA file of multiple sequences into FASTA files of individual sequences


This command will create as many files as there are member sequences in the same directory as the source file,
incrementally numbered with a .fasta extension. (e.g. for an input file with 5 member sequences, such as the Arabidopsis genome, it will output files 1.fasta to 5.fasta.
awk '/^>/{f=++d".fasta"} {print > f}' 


(8) Joining multiple FASTA files into a single, multi-sequence FASTA file

This is the reverse of the above and we will assume a few things. Firstly, you want to combine all fasta files in thecurrent directory and, secondly, they all have the same extension (.fasta). Adapt to your needs if this is not the case!
cat *.fasta > 


(10) List the sequence headers in a FASTA file
grep ">" 


(1) Counting the number of sequence entities in a FASTA file
grep ">"  | wc -l


(12) Determining the length of the sequence in a FASTA file

This method will give the TOTAL sequence length of a FASTA file. This means that if your FASTA file has a number of sequence entries, it will return the sum of the length of each sequence entry. To get the length of individual entries you would first need to split the file into individual entries, or do it programatically: either using a homegrown method or a Bioinformatics library such as BioPerl.
grep -v ">"  | tr -d [:space:] | wc -c

没有评论:

发表评论

订阅：博文评论 (Atom)

我的博客列表

FlowingData

Commissioner of Labor Statistics fired, because labor statistics were not to president’s liking - The administration disagreed with the jobs count released by the Bureau of Labor… *Tags:* government, jobs, NPR, uncertainty
12 小时前
Discover Blogs

ADHD Diagnoses Seem to Have Increased on the Internet – Is It Really That Common? - How common is ADHD? Learn more about ADHD and why it seems more people have it than really do.
14 小时前
OUseful.Info, the blog...

Makes me laugh every time… - The things you find clearing out a machine… 1 day to go…
2 天前
Nothing in biology makes sense!

Holistyczne podejście do zdrowia: Monitoruj swój styl życia ze smartwatchem Garett. - W dzisiejszym, dynamicznym świecie, gdzie pędzimy od jednego zadania do drugiego, łatwo stracić z oczu to, co najważniejsze: nasze zdrowie i dobrostan. C...
4 天前
Public Rambling

Why do we still publish in scientific journals ? - We publish in scientific journals to disclose our discoveries, such that others can build upon them. But we now have preprint servers and we can quickly ...
1 周前
Anole Annals

The Alluring Anoles of Alto Velo - Although I have not been a prolific poster on AA, I have enjoyed contributing articles about the obscure and rarely seen anoles of the Greater Antilles and...
1 周前
Phylogenetic Tools for Comparative Biology

A Pagel '94 type correlation model with one binary trait and one continuous character for phytools - Just yesterday I posted code illustrating how to fit a Pagel (1994) type correlational trait evolution model, but for one binary and a second continuous...
2 周前
Getting Genetics Done

Repost: Tidy RAG in R with ragnar - Reposted from the original at: https://blog.stephenturner.us/p/tidy-rag-in-r-with-ragnar *Retrieval augmented generation in R using the ragnar package. ...
2 周前
iPhylo

How many times are DNA barcoding datasets cited? - This note accompanies a dataset that I uploaded to Zenodo ( https://doi.org/10.5281/zenodo.15824274). My goal in creating this dataset is to link data cre...
3 周前
PubMed New and Noteworthy

Recent Changes to References, Search Tools, Related Citations, and Sharing Features - As part of our commitment to continuous improvement, PubMed recently introduced several updates designed to improve usability and consistency across the...
4 周前
Python for Bioinformatics

Python to draw geometry - Over the last couple of weeks I've been working on a library of geometry utilities in Python, code to work with matplotlib to draw figures for my geometr...
1 个月前
The Molecular Ecologist

2025 Molecular Ecology Prize goes to Rosemary Gillespie, for harnessing molecular phylogenetics to understand community assembly and ecology - The Molecular Ecology Prize Committee has announced the 2025 recipient of the award, which recognizes an outstanding scientist who has made significant con...
2 个月前
One Tip Per Day

Notes on DEseq2 design - My note from today's lab meeting: 1. For senario like Rambo's project, where in a case-control two group comparison, each subject has multiple repe...
2 个月前
Econometric Sense

Econometrics and Psychometrics: Rivers Out of Biometry. - *Abstract* The period 1895–1925 saw the origins and establishment of the fields that came to be called econometrics and psychometrics. I consider what th...
3 个月前
Shige's Research Blog

Vim + VSCode tips - I find this post super helpful.
4 个月前
theBioBucket*

Trailism: Python Script for Batch Analyzing GPX Files & Reporting Statistics for Curves & Stretches - Run this script from a folder with GPX files and get a markdown formatted report with statistics for curves and stretches! import math import os import g...
4 个月前
Jabberwocky Ecology

Regime Shift: A Winter of Grim - Every episode in a tv series starts with a recap. So… Previously, on Portal: the Regime Shift — In fall of 2024, one of the ruling families of Portal — the...
4 个月前
The Tree of Life

The Bird Way - I have a hardback version of The Bird Way by Jennifer Ackerman but had not gotten around to reading it alas. But now I am listening to the AudioBook Ver...
4 个月前
The EEB & flow

Recentring the research and teaching mission of universities: The horror of tracking my work hours - For years, I keep telling people that I need to work a lot in order to stay on top of things. Gone are the days of the professor sitting in their office ...
6 个月前
Living in an Ivory Basement

Announcing: the chill-filter Web site - DNA sample composition at a glance! - what's in my sample, dude?
7 个月前
Quantum Forest

The Ghost of p-values Past - — …but it doesn’t make any sense!— Please hear me out.— You have one minute, nothing more.— So our hero is a scientist/statistician who doesn’t like Christ...
7 个月前
Abhishek Tiwari

Input vs Output Privacy - Privacy in data systems has traditionally focused on protecting sensitive information as it enters a system - what we call input privacy. However, as syste...
8 个月前
What You're Doing Is Rather Desperate

Pattern recognition in Google Maps - Headline: Second farm in the Hawkesbury region confirmed to have bird flu as biosecurity zone widened. Quite rightly, the news organisation chooses not to ...
1 年前
Statistics, R, Graphics and Fun

Bye, RStudio/Posit! - Who is down? Me. After more than 10 years at RStudio/Posit, the time has come for me to explore other opportunities. A little over two weeks ago, I was t...
1 年前
theoretical ecology

Time to leave “species richness” behind, as it serves no scientific purpose on x- or y-axis - A guest post by Carsten F. Dormann, University of Freiburg In “biodiversity science”, species richness takes a very prominent role. It is the most analysed...
1 年前
The OpenHelix Blog

Dynamics 365 Finance moderniza las operaciones financieras de tu empresa - Dynamics 365 Finance es un sistema ERP basado en la nube diseñado para optimizar los procesos comerciales. Al ofrecer una plataforma para todas las funci...
1 年前
Junk Charts

Losing the plot while stacking up the bars - Kaiser looks at an infographic about dirty American cities.
1 年前
microBEnet: The microbiology of the Built Environment network.

Join Indoor Air 2024’s Conference Countdown Challenge - (posting on behalf of Dr. Karen Dannemiller) The Indoor Air 2024 conference from the International Society for Indoor Air Quality and Climate (ISAIQ) wil...
2 年前
crazyhottommy's blog

10 tips for learning git - 1/ Several basic commands will serve you a long way: git clone git add git commit -m git push Those are enough to get you started. To be honest, those a...
2 年前
Cartesian Faith

Wallets as a reward and penalty mechanism for online behavior - I’m creating an AI moderator for social media using reinforcement learning (RL). My goal is to reduce toxicity/outrage and increase … Continue reading →
2 年前
The Geek Stuff

PaloAlto init-cfg.txt Bootstrap Config file Layout with Examples - When you install and configure the PaloAlto firewall, when the firewall boots up for the first time, it does the bootstrapping process. PaloAlto uses the s...
3 年前
Seqonomics

Favorite books 2021 - I've been reading mostly fiction this year. Rather than go over my favorite books, this year year I'm recommending three authors whose books I find consi...
3 年前
Archaeological Network Analysis

GIS and networks for archaeology: spring school for PhD students - There’s a cool Brussels spring school coming up on GIS and networks for archaeology. I’m giving the networks keynote, yay 🙂 The program looks very detaile...
3 年前
Struggling Through Problems

Can we have anti-lambdas please? - Aren’t you tired of flattening your code to avoid repeated computation: val x = longComputation() ls map { y => (x, y)} when you’d rather write ls map...
4 年前
Genomes Unzipped

Best Genealogy Software - Researching your family history is a fascinating experience. Even if you find out that your ancestors led very normal lives. Just being able to see the nam...
4 年前
MYRMECOS

The 9th Annual Holiday Print Sale - I am excited to announce the return of the Holiday Insect Print Sale, now in its 9th year! I have selected 35-ish photographs that will be available at 70%...
4 年前
PLoS Blogs Network

PLOS Stands with the National Academies on Continuing WHO Funding - Yesterday, the presidents of the National Academies of Sciences, Engineering, and Medicine released a statement warning that interrupting funding for Wor...
5 年前
My Biased Coin

Current CS 124 Stats - This is as much personal recording for me (and perhaps of interest to Harvard people who read the blog). But also putting the numbers here for others to k...
5 年前
R, Ruby, and Finance

How to rapidly improve your management skills - It can be overwhelming when you start as a new manager, or when you're an existing manager who is asked to take on more responsibility. The success crite...
5 年前
Dienekes’ Anthropology Blog

Merry Christmas -
5 年前
Darren Wilkinson's research blog

Unbiased MCMC with couplings - Yesterday there was an RSS Read Paper meeting for the paper Unbiased Markov chain Monte Carlo with couplings by Pierre Jacob, John O’Leary and Yves F. Atch...
5 年前
the mind wobbles

SWAT4(HC)LS 2019: Morning Presentations, Day 2 - All talk and poster papers are available for download at http://purl.org/ronco/swat The organizers would like to have everyone’s suggestions for future con...
5 年前
bayesianbiologist

Eigenvectors from Eigenvalues – a NumPy implementation - I was intrigued by the recent splashy result showing how eigenvectors can be computed from eigenvalues alone. The finding was covered in Quanta magazine an...
5 年前
Yixf's blog

蝉 - 蝉聒噪炎日下，噤声风露中。居高空自好，草底蚁拖虫。
5 年前
HubLog

Converting PDF to PNG or JPEG - What are the best tools or services for converting a page of a PDF file to an image? The first thing that usually shows up for converting PDF to PNG or J...
5 年前
The Genome Factory

25 reasons assemblies don't make it into Refseq - Introduction When you submit a genome assembly, or NCBI assembles the reads you submitted, it ends up in Genbank. If the assembly is of sufficient qualit...
5 年前
ENMTools

Now on biorXiv: Evaluating species distribution models with discrimination accuracy is uninformative for many applications - I've just recently posted a new paper to biorXiv entitled "Evaluating species distribution models with discrimination accuracy is uninformative for many ap...
6 年前
Biospherica

Phased antenna array design using Hamiltonian Monte Carlo - Phased antenna arrays allow a radio or microwave beam to be steered electronically. They are of increasing practical importance. Array optimisation problem...
6 年前
My Weblog on Bioinformatics, Genome Science, Next Generation Sequencing

FAAH-OUT This woman feels no pain! - *"A woman in Scotland can feel virtually no pain due to a mutation ...At age 65, the woman sought treatment for an issue with her hip, which turned out t...
6 年前
Linux Commando

ts: epitome of the Unix philosophy - Do one thing and do it well - the Unix philosophy In this new age of Linux bloatware (hello, *systemd*), it is exhilarating to discover small gems like ...
6 年前
Fabio Marroni’s Blog

Separatori decimali e migliaia in excel, libreoffice, e R - Accade spesso di ricevere o spedire una tabella excel (o di testo, che comunque viene poi aperta con excel) da un collega che ha impostazioni dei separator...
6 年前
派的无理性

饭碗与文明 - 饭碗与文明家里有一个裂了纹的小瓷碗。几次用它盛了热饭，总是担心漏出热水，烫了手，或者随时啪嚓一下四分五裂，伤了手脚。还好，这些情况都没有发生。扔掉算了，但好歹还能用，放在一边好了。这样的情形，应该用敝帚自珍来形容。一支瓷碗，在现在是不足为奇的，可却是我们祖先的伟大发明，要不为何china就是瓷器，...
6 年前
geolabs

Coffee Bar Density in Nuremberg - Fun project coffee heatmap based on OSM data…. Friends of mine are planning to open a coffee bar in Nürnberg city centre. Where is a blank spot?
6 年前
Keep on Fighting!

优秀到无法被忽视 - 看了一篇《优秀到无法被忽视》的总结，有几个观点有所共鸣： 1. 激情是一种稀缺的特质，尤其是职业激情。在《穷查理宝典》读书笔记中谈芒格的第六个人生道理时，我曾说多数人都缺乏对某件事的强烈兴趣。在对秦旭的采访中，我也曾说有稳定热情的人相对少见。激情已经够少见，职业激情就更少见了。...
6 年前
JAGS News

Off to Warwick - I apologise for the low frequency of updates to this blog. The big news is that I am moving back to the UK next month after 23 years in France working for ...
7 年前
Sharp Statistics

EasyER Version 1.2 - Version 1.2 of EasyER has now been released. The main change is the addition of a chart builder interface for interactively creating charts. The chart bu...
7 年前
Mailund on the Internet

Lecture notes on Computational Thinking - I am working on some lecture notes that should become a book at some point. It is for a class on “Computational Thinking”, which I guess is just a fancy te...
7 年前
Vijay Barve

Darwinazing biodiversity data in R - “Darwin Core (DwC) is a standard maintained by the Darwin Core maintenance group. It includes a glossary of terms (in other contexts these might be called ...
7 年前
Next-Gen Sequencing

Genome Coverage from BAM file - There are many excellent tools for analysis of Next Gen Sequencing data in the standard BAM alignment format so I was surprised how difficult it was for me...
7 年前
The MolBio Hut

A “bottoms-up” approach to scientific publishing - A few days ago, a piece was posted at ASAPbio which reminded me of an idea I had a long time ago about a new model for publishing and peer review. I descri...
7 年前
Sam Harris: Author, neuroscientist, philosopher.

#113 — Consciousness and the Self - In this episode of the Waking Up podcast, Sam Harris speaks with Anil Seth about the scientific study of consciousness, where consciousness emerges in natu...
7 年前
i'm a chordata! urochordata!

Bayesian SEM with BRMS - Note: for the most up to date version of this, see this post on rpubs Background So, it looks like brms version 2.0 implements multivariate responses – and...
7 年前
The Spittoon

23andMe Adds New Genetic Health Risk Reports - 23andMe released two new genetic health risk reports for customers for Age-Related Macular Degeneration and Hereditary Hemochromatosis. The post 23andMe ...
7 年前
The UNIX School

Perl - Connect to Oracle database and SELECT - How to connect to a database from a Perl program? Let us see in this article how to connect to Oracle and read from a table. As a pre-requisite, we need ...
8 年前
Research tips

Monash Rmarkdown templates on github - My Rmarkdown templates for staff and students in my department are now available as github repositories. MonashThesis: For PhD theses. MonashBeamer: For be...
8 年前
Portfolio Probe

US market portrait 2016 final - US large cap market returns. Fine print The data are from Yahoo The S&P 500 stocks are used (as implied by Wikipedia on 2016 January 16) that still survive...
8 年前
The Praise of Insects

Cook Islands Insect Expedition: Funding Granted! - Satellite image of Rarotonga, Cook Islands. Image via Wikimedia Commons. Licence: Public Domain. As the tagline of this blog suggests, I have a profound ...
8 年前
CoreGenomics

Unintended consequences of NGS-base NIPT? - The UK recently approved an NIPT test to screen high risk pregnancies for foetal trisomy 21, 13, or 18 after the current primary screening test, and in pl...
8 年前
librestats

Comparing Symmetric Eigenvalue Performance - Lazy reader’s guide: skip to the pretty pictures, skim the conclusions section, ignore the rest. Background I think a lot about eigenvalue and singular ...
8 年前
YOKOFAKUN

Hello WDL ( Workflow Description Language ) - This is a quick note about my first WDL workflow (Workflow Description Language) https://software.broadinstitute.org/wdl/. As a Makefile, my workflow would...
8 年前
Ewan's Blog; bioinformatician at large

Sharing clinical data: everyone wins - Patients who contribute their data to research are primarily motivated by a desire to help others with the same plight, through the development of better ...
8 年前
Timely Portfolio

Ooms Magical Polyglot World - *crossposted from BuildingWidgets* ------------------------------ Jeroen Ooms (@opencpu) provides R users a magical polyglot world of R, JavaScript, C, and...
9 年前
数据科学与R语言

TensorFlow初体验 - 以前玩深度学习一直是用的theano和keras，做为谷粉不能不试一下大热的TensorFlow。首先安装起来。 TensorFlow的安装指南非常详细，我是python的anaconda环境，所以直接先创建一个新环境，创建前先更新一下conda conda update conda conda update ...
9 年前
Haldane's Sieve

Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit - Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit David S. Lawrie bioRxiv doi: http://dx.doi.org/10.1101/042622 Forward Wright...
9 年前
bioCS

Avoiding unnecessary memory allocations in R - As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes: The reason why the C++ function is faster is s...
9 年前
me nugget

Visualizing model predictions in 3d - Here is a brief exploration of the misc3d package, which has some nice functions that can be used in conjunction with rgl. I am especially pleased with the...
9 年前
Bigcomputing

A Machine Learning Example in R for Continous Outcomes using Cubist - Cubist is a machine learning algorithm for continous outcomes. Cubist is a rule-based decision tree that automatically deals with missing values. This mak...
9 年前
EvoPhylo

Darwin the dropout - It annoys me slightly that almost all the pictures of Darwin circulating around on #DarwinDay are of some grey-haired old guy. Darwin made many of his impo...
9 年前
Genome Toolbox

Normal Distribution Functions in R - I always need to look up how to use the distributional functions in R. Rather than it always being a guessing game I made a quick primer with visual exampl...
9 年前
Gregor Gorjanc

Remove Spotify App Badges on Mac - See here
9 年前
Oscillatory Thoughts

New Voytek Lab paper by Richard Gao! - Voytek Lab Cognitive Science PhD student, Richard Gao, wrote a detailed piece about his new paper, "Interpreting the Electrophysiological Power Spectrum,” ...
9 年前
IDV User Experience

Beautiful Old Map - Etsy is a bittersweet treasure trove of vintage maps. I have mixed feelings because, on one hand, sellers disassemble awesome old cartographic volumes, yan...
9 年前
Stochastic Nonsense

Splitting Audio with ffmpeg - Here’s a quick utility to use a set list and ffmpeg to split single audio files into multiple tracks. It splits audio files via a setlist then sets the s...
9 年前
Graph of the Week

Pluto: To Catch an Icy King - Sly as a fox, it is. Mysterious and diminutive, it has eluded us for decades. Despite what we've learned about Pluto, constant debate continues to rage ov...
10 年前
Jeffrey Horner

This is one of my favorite ggplot2 plots I’ve ever made, but it... - This is one of my favorite ggplot2 plots I’ve ever made, but it makes me sad. Can you deduce what this plot conveys? Explain the sporadically dashed colo...
10 年前
Omnia sunt Communia!

Sistemas Fotovoltaicos en Redes de Distribución - Hace un par de meses conseguí una plaza de Profesor Contratado Doctor en la Escuela Técnica Superior de Ingeniería y Diseño … Sigue leyendo →
10 年前
Tr8dr

Market-Making Portfolio & Hedging - With market making we can try to be neutral by skewing prices in such a way as to maintain a neutral position. To the extent that the market can become 1...
10 年前
We think therefore we R

The curious case of ARIMA modelling using R - I recently made an interesting observation that I thought is worth sharing. During a data expedition process while trying to fit an ARIMA (auto-regressi...
10 年前
Biochemistry, Molecular, Cellular & Developmental Biology

Night Vision Without the Goggles - The human eye consists of focusing elements and photoreceptors. Images are formed when light is projected onto the retina, which houses photoreceptors call...
10 年前
Adventures in Statistical Computing

A Bet Over Beer (or how to write a faster root() function in IML in SAS 9.4) - I haven't been here in a while -- previous job did not allow blogging. Over beers the other night with some friends and alumni from SAS, someone mention...
10 年前
Systematic Investor

New Home of Systematic Investor Blog - Please visit the New Home of Systematic Investor Blog at SystematicInvestor.GitHub.io
10 年前
Left Censored

R and MPI on Ohio Supercomputer Center’s Oakley cluster - A few years ago, I wrote a short guide to Using R and snow on the Ohio Supercomputer Center’s Glenn cluster. Several things have changed in the world of R ...
10 年前
Statistic on aiR

Adjustment for Multiple Comparison Tests with R: Resources on the web - *1. Bonferroni correction* p.adjust(p, method = "bonferroni") Read: http://en.wikipedia.org/wiki/ *2. Sidak (Dunn-Sidak) correction* Read: http://en.wiki...
10 年前
Carnival of Evolution

Carnival of Evolution #78 - the Short-Arse edition - Finally, the 78rd edition is up. Not too much has happened in the evolution blogosphere to my knowledge, so this edition is really short but good. Like sho...
10 年前
MycorWeb Fungal Genomics

1000 Fungal Genome (1KFG) project: Graduate Student-Postdoc Challenge (2014) - The 1000 Fungal Genome (1KFG) project is a large-scale community sequencing project supported by the Joint Genome Institute (JGI). The goal of 1KFG is to ...
10 年前
Matt's Stats n stuff

Shading between two lines – ggplot - First one to say geom_ribbon loses. I was plotting some data for a colleague, had two lines (repeated experiment) per person (time on the x axis) facetted ...
11 年前
Quantitative Ecology

Sample uniformly within a fixed radius. - I was asked how to do this today and thought that I would share the answer: *## Sample points uniformly within a fixed radius* nrand*=*1000 maxstep*=*10 *...
11 年前
Bioinformatics Tutorials

Comments on “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines” - I recently co-authored a paper in Genome Biology entitled “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug ...
11 年前
The Ubuntu R Blog

c2d4u and RRutter Ubuntu PPA Updates - On January 27, 2014 Raring Ringtail (13.04) reached end of life. I have been slowly removing the Raring repositories from the RRutter PPA, c2d4u PPA, and...
11 年前
ExploringDataBlog

A question of model uncertainty - It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time. The first is th...
11 年前
PremierSoccerStats

Craig Bellamy – quite dplyr - This weekend brought a couple of firsts in Cardiff’s winner against Norwich After a wretched time at Manchester United, Wilfried Zaha recorded his first Pr...
11 年前
The Atavism

A Very Special edition of Sunday Spinelessness - Wow, it's been a long time since I wrote something here. Let's see if I can remember how this goes: I find a weird-looking bug and take some photos. Then ...
11 年前
anrprogrammer

Database Reflection using dplyr - At work I write a ton of SQL, and I do most of my querying using R. The workflow goes: Create a string with the SQL in R Plug the string into fetchQuery (...
11 年前
Jack Zhu's Bioinfo

userstyles.org - Dictionary.com - permanent IPA pronounciation - Dictionary.com - permanent IPA pronounciation - Themes and Skins for Reference - userstyles.org
11 年前
Sacha Epskamp

A general Shiny app for reading data - A general Shiny app for reading data A general Shiny app for reading data I have been playing around a lot with making web-applications using shiny. What...
12 年前
mintgene

Reconstructing Principal Component Analysis Matrix - Edit: I’ve updated afterPCA function to work with input of any dimensions. Previously it only generated correct output for square matrices. PCA is widely u...
12 年前
Dodecad Ancestry Project

D-statistics on ADMIXTURE components - I have implemented the method of D-statistics as an R function. This will allow you to take your raw genotype data and calculate various D-statistics of th...
12 年前
Simply Statistics

We've Moved! - Simply Statistics has moved to a new platform and so if you’ve been following our blog on Tumblr, you’ll have to update your links/RSS feeds to the new w...
12 年前
Treevolution: Biology through the evolutionary lens

A genetic cartography of humans - The Phase I paper of the 1000 genomes project has been published in Nature. Similarly to the completion of the first draft of the human genome sequence,...
12 年前
nzprimarysectortrade

Have I chosen the right power company? - Do you always wonder if I have chosen the right power company and have not been over charged? Your questions may be answered here (if you reside in Welling...
12 年前
genomeboy

Closing time - A note from PLOS Blogs Community Manager Victoria Costello: We’re sad to see Misha Angrist leave our blog roster but we’ve been honored to host the last tw...
12 年前
Quantitative Ecology

Blog Moving - Thanks to everyone who's visited this blog and provided encouragement and suggestions. My blog is moving to http://danieljhocking.wordpress.com/quantitativ...
13 年前
EvolvingSpaces

Sharing Data in Archaeology - ...I know it has been a long time since my last post and many, many things happened. Our CAA session on Uncertainty was great (many thanks to the contribu...
13 年前
Coffee and Econometrics in the Morning

Using apply() to create a unique id - Suppose you have a data set with two identifiers. For example, maybe you're studying the relationships among firms in an industry and you have a way to lin...
13 年前
Blog | Wainwright Lab | University of California, Davis

Is your phylogeny informative? - (crossposted from my lab notebook) Yesterday my paper [cite]10.1111/j.1558-5646.2012.01574.x[/cite] appeared in early view in Evolution,As the open access ...
13 年前
Talk To Myself

多重PCR引物设计 (2) ——平衡反应体系 - 文 / 屈武斌 (quwubin@gmail.com) 多重PCR的一个关键点在于反应体系的平衡。即使多重PCR体系内每对引物的特异性都很好，在单重PCR测试中扩增效果也非常好，但到了多重体系中却经常出现某些目标产物扩增不出来的现象。原因就在于反应体系的不平衡，这种不平衡导致在前期的几轮反应中某些优势引物...
13 年前
Consistently Infrequent

R: A Quick Scrape of Top Grossing Films from boxofficemojo.com - Introduction I was looking at a list of the top grossing films of all time (available from boxofficemojo.com) and was wondering what kind of graphs I wou...
13 年前
Recology

Recology has moved - Recology has moved to Github, using Jekyll. I just finished moving the Recology blog content, etc. to Github. This move is intended to make it easy to do...
13 年前
BMB's commonplace

Google Scholar (still) sucks - (This is a follow-up to my previous post on the topic.) I was encouraged by the appearance of *two* R-based Scholar-scrapers, within a week of each other. ...
13 年前
The Plant Informatics

Rod Page’s VIZBI 2011 annotated links - iPhylo is hands-down my favorite blog, whose author is Rod Page. In the blog post “Some VIZBI 2011 links”, Rod grabbed a list of visualization softwa...
13 年前
Modern Toolmaking

25+ more ways to bring data into R - The rdatamarket post on the Revolutions blog and this post on Decision Stats reminded me about my list of Data APIs/feeds available as packages in R on Cro...
13 年前
dechronization

- Applications for our r-workshop are still being accepted - please apply by June 15! We are pleased to announce an intensive short course on using R to per...
14 年前
joint posterior

compiler and runiregGibbs (bayesm) - So everyone's excited about the new R 2.13 release because of the compiler package. Apparently it is easy to get a 3x speed increase by simply compiling a ...
14 年前
The Prince of Slides

Introduction to The Prince of Slides - In my everlasting search for the answer to every sports (among other topics) question imaginable, I decided to create this blog. While there are a number o...
15 年前
Back Side Smack

-
NYGC Blog | New York Genome Center

-
猴子的博客

-
Biology-blog.com: The place for biology

-
Carl Boettiger

-
Edwin Chen's Blog

-
Data Jujitsu

-
Suprageography

-
O|B|F News

-
alexfarquhar's posterous

-
Bio and Geo Informatics

-
PolStat Blog

-
Sean Davis - Recent Blogging

-
Zero Intelligence Agents

-
All Things Tagxedo

-
Jason Bryer

-
From the bottom of the heap

-
AnthroGenetics' Blog

-
FinchTalk

-
Tore Opsahl

-
Programming for Scientists

-
格物堂

-
Hottest Questions - BioStar

-
Jinlong Zhang's Homepage - Main

-
怡然轩 | Yixuan's Blog

-
Pathogens: Genes and Genomes

-
Genome Engineering | Site Wide Activity

-