1.
http://www.biostars.org/post/show/5664/hardware-needed-to-analyse-microarray-data-with-rbioconductor/
Many analysis procedures are memory hungry, 8GB of RAM is inadequate. Go for 64GB or 96GB.
The speed of individual CPU cores is less important than the number of them. Thus it is far better to buy a configurationwith more and somewhat slower processors than fewer but the fastest ones.
I've done analysis of Affy exon arrays (using XPS). You do not need a monster computer to do this analysis. What you doneed is a big pile of RAM and adequate hard drive space. For example, you could buy an ordinary box (e.g. a DellPrecision workstation). Buy 16 gigs of ram from NewEgg, put in two 2-terrabyte hard drives, and load a 64-bit linux suchas Ubuntu. This machine would also be perfectly adequate for many ordinary sorts of microarray analysis.
If you find yourselves doing a lot more complex bioinformatics analysis you will need multiple machines, but at that pointthe best bet is to use someone else's hardware-- either a local compute cluster or cloud machines. However, it soundslike you're not there yet, and there's no reason to spend $10,000+ on a machine at this point.
2.
http://www.biostars.org/post/show/43240/hardware-benchmarking-tasks-for-a-high-performance-bioinformatics-cluster/
3.
http://www.biostars.org/post/show/2604/hardware-suitable-for-generic-nextgen-sequencing-processing/
Okay, well then I'll go ahead and throw some info out there in the hopes that it's useful to you.
What I can tell you is that the cluster we share time on has 8-core machines with 16GB of RAM each and they'resufficient for most of our needs. We don't do much assembly, but we do do a ton of other genomic processing, rangingfrom mapping short reads all the way up to snp calling and pathway inference. I also still do a fair amount of arrayprocessing.
Using most cluster management tools, (PBS, LSF, whatever), it should be possible to allow a user to reserve more thanone CPU per node, effectively giving them up to 16 GB for a process if they reserve the whole node. Yeah, that meanssome lost cycles, but I don't seem to use it that often - 2GB is still sufficient for most things I run. It'd also be good toset up a handful of machines with a whole lot of RAM - maybe 64GB? That gives users who are doing things likeassembly or loading huge networks into RAM some options.
I more often run into limits on I/O. Giving each machine a reasonably sized scratch disc and encouraging your users tomake smart use of it is a good idea. Network filesystems can be bogged down really quickly when a few dozen nodesare all reading and writing data. If you're going to be doing lots of really I/O intensive stuff (and dealing with short reads,you probably will be), it's probably worth looking into faster hard drives. Certainly 7200RPM, if not 10k. Last time I looked15k drives were available, but not worth it in terms of price/performance. That may have changed.
I won't get into super-detail on the specs - you'll have to price that out and see where the sweet spot is. I also won't tellyou how many nodes to get, because again, that depends on your funding. I will say that if you're talking a small clusterfor a small lab, it may make sense to just get 3 or 4 machines with 32 cores and a bunch of RAM, and not worry abouttrying to set up a shared filesystem, queue, etc - it really can be a headache to maintain. If you'll be supporting a largeruserbase, though, then you may find a better price point at less CPUs per node, and have potentially fewer problemswith disk I/O (because you'll have less CPUs per HD).
People who know more about cluster maintenance and hardware than I do, feel free to chime in with additions or corrections.
没有评论:
发表评论