2011年6月26日星期日

Splitting file based on line numbers

http://www.expertsheaven.com/split-file-based-on-line-numbers-in-unix/

This script will be useful if you require to split a huge file based on number of lines or records. Normal file splitters available in the market split the file based on the size (byte, KB, MB) which cannot be used to split based on number of lines or records.
Steps to use the script:
  1. Save the below script as lsplit.ksh

    propDIR=./
    propFile=$propDIR/SSNRange.txt.prop
    inpFile=$1
    date
    startLineNo=1
    count=1
    while read line
    do
    startLineNo=`echo $line | cut -f1 -d,`
    endLineNo=`echo $line | cut -f2 -d,`
    if [ "$endLineNo" != "" -a "$startLineNo" != "" ]; then
    echo "Cut here from $startLineNo to $endLineNo"
    sed -n "$startLineNo","$endLineNo"p $inpFile > $inpFile.split.$count
    count=`expr $count + 1`
    fi
    done < $propFile
    date
  2. Create a properties file SSNRange.txt.prop which would contain the range of records or lines. Example of properties file is as follows
    1
    2
    3
    1,400
    401,1504
    1505, 7000
  3. Run the script
    1
    $ lsplit.ksh infile.txt
  4. Three output files will be created
    • infile.txt.split.1 –> Creates a file with first 400 lines
    • infile.txt.split.2 –> Creates a file with lines starting from 401 to 1504
    • infile.txt.split.3 –> Creates a file with lines starting from 1505 to 7000
Advantages of this script:
  • File is split based on line numbers are records.
  • No manual editing is required the correct the first and last records
  • Easy to handle it in batch

2011年6月25日星期六

plot for genomic data

1. Vector graphics editor, like CorelDRAW, Adobe Illustrator, Xara Xtreme, Macromedia FreeHand, Adobe Fireworks, Inkscape or SK1.

2. http://www.ncrna.org/idiographica/
3. http://osfinder.dna.bio.keio.ac.jp/synteny_map.html
4. http://circos.ca/

Khan Academy - Watch. Practice. Learn almost anything—for free

http://www.khanacademy.org/#biology

There are videos for Science, Math, Humanities and others.

2011年6月23日星期四

grep, egrep, fgreb and regular expression

fgrep searches files for one or more pattern arguments. It does not use regular expressions; instead, it does direct string comparison to find matching lines of text in the input.
egrep works in a similar way, but uses extended regular expression matching (as well as the \< and \> metacharacters) as described in the regexp reference page.
Note:
When using the man utility to view the regexp reference page, use the following command to ensure that you get the correct reference page:
man 5 regexp
If you include special characters in patterns typed on the command line, escape them by enclosing them in apostrophes to prevent inadvertent misinterpretation by the shell or command interpreter. To match a character that is special to egrep, put a backslash (\) in front of the character. It is usually simpler to use fgrep when you don't need special pattern matching.
grep is a combination of fgrep and egrep. If you do not specify either -E or -F, (or their long form equivalents, --extended-regexp or --fixed-strings), grep behaves like egrep, but matches basic regular expressions instead of extended ones. You can specify a pattern to search for with either the -e or -f option. If you specify neither option, grep (or egrep or fgrep) takes the first non-option argument as the pattern for which to search. If grep finds a line that matches a pattern, it displays the entire line. If you specify multiple input files, the name of the current file precedes each output line.

http://www.mkssoftware.com/docs/man1/grep.1.asp

http://www.mkssoftware.com/docs/man5/regexp.5.asp

Create a Screenshot on Mac OS

Cmd+Ctrl+Shift 3 = Screenshot to clipboard
Cmd+Ctrl+Shift 4 = Selected screenshot to clipboard

Cmd +Shift+3 = Screenshot to desktop
Captures entire desktop to a file on the desktop as 'picture #' . This option lets you capture the whole screen. If you want just one window on your screen, you will have to edit the picture using image editing software.

 Cmd +Shift+4 = Selected Screenshot to desktop
Allows you to use your mouse to select a specific part of your desktop for capture. This will turn your mouse pointer into a cross, please hold down the mouse button and drag to select the part of the screen you want. When you release the button the screenshot will "snap" that part of the screen. Press 'Esc' to release.

 Cmd +Shift+4 then press Spacebar
Allows you to select which window to capture

2011年6月19日星期日

2011年6月17日星期五

R - a cool image editor

http://img269.imageshack.us/img269/6540/negn.png

plot points, regression line and residuals with R

http://statistic-on-air.blogspot.com/2011/06/how-to-plot-points-regression-line-and.html

Polynomial regression techniques with R

http://statistic-on-air.blogspot.com/2009/09/polynomial-regression-techniques.html

Latin squares design in R

http://statistic-on-air.blogspot.com/2010/01/latin-squares-design-in-r.html

2011年6月16日星期四

The Database for Annotation, Visualization and Integrated Discovery (DAVID )

http://david.abcc.ncifcrf.gov/

submit a list of genes, get the functional annotation of such genes.

2011年6月14日星期二

merge (join) two files based on two columns

merge (join) two files based one two columns, the rows are uniquely defined by the shared two columns in these two files. join (merge) all the columns from the two files based on the same values in these two share columns.

#####################################################################
# file 1
cat test
1    200    T    C    1    500
1    539702    C    C    1    1501
1    539703    T    T    1    1502
1    539704    T    T    1    1503
1    539705    A    A    1    1504
1    539706    T    C    1    1505
1    539707    G    A    1    1506
1    539708    A    A    1    1507
1    539709    G    G    1    1508
1    539710    C    -    1    1509
2    99    M    T    3    1000
# file 2
head -n20 test2
1    539702    C    C    1    1501
1    539703    T    T    1    1502
1    539704    T    T    1    1503
1    539705    A    A    1    1504
1    539706    T    C    1    1505
1    539707    G    A    1    1506
1    539708    A    A    1    1507
1    539709    G    G    1    1508
1    539710    C    -    1    1509
1    539714    A    G    1    1510
1    539715    A    A    1    1511
1    539716    A    A    1    1512
#
nawk 'FNR==NR{f1[$1,$2]=$0;next}{idx=$1 SUBSEP $2; if(idx in f1) $0=f1[idx] OFS $0}1' test test21    539702    C    C    1    1501 1    539702    C    C    1    1501
1    539703    T    T    1    1502 1    539703    T    T    1    1502
1    539704    T    T    1    1503 1    539704    T    T    1    1503
1    539705    A    A    1    1504 1    539705    A    A    1    1504
1    539706    T    C    1    1505 1    539706    T    C    1    1505
1    539707    G    A    1    1506 1    539707    G    A    1    1506
1    539708    A    A    1    1507 1    539708    A    A    1    1507
1    539709    G    G    1    1508 1    539709    G    G    1    1508
1    539710    C    -    1    1509 1    539710    C    -    1    1509
1    539714    A    G    1    1510
1    539715    A    A    1    1511
1    539716    A    A    1    1512
1    539717    G    G    1    1513
1    539718    C    T    1    1514
1    539719    T    C    1    1515
#
nawk 'FNR==NR{f1[$1,$2]=$0;next}{idx=$1 SUBSEP $2; if(idx in f1) $0=f1[idx] OFS $0}1' test test2 | awk '$7'
1    539702    C    C    1    1501 1    539702    C    C    1    1501
1    539703    T    T    1    1502 1    539703    T    T    1    1502
1    539704    T    T    1    1503 1    539704    T    T    1    1503
1    539705    A    A    1    1504 1    539705    A    A    1    1504
1    539706    T    C    1    1505 1    539706    T    C    1    1505
1    539707    G    A    1    1506 1    539707    G    A    1    1506
1    539708    A    A    1    1507 1    539708    A    A    1    1507
1    539709    G    G    1    1508 1    539709    G    G    1    1508
1    539710    C    -    1    1509 1    539710    C    -    1    1509

http://www.unix.com/unix-dummies-questions-answers/158593-merging-two-files-based-two-columns-make-third-file.html

new compression tools for genomic data

A novel compression tool for efficient storage of genome resequencing data


When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.

Table 4.
Performance of GRS in compressing A. thaliana genome of TAIR9 using TAIR8 as the reference
Chromosome number Varied sequence percentage (%) Raw file size (MB) Compressed file size Compression rate
1 0.016 314 29.4 715.0 B 43 116.3
2 0.036 145 19.0 385.0 B 51 747.9
3 0.046 910 22.7 2.9 KB 6709.0
4 0.000 301 17.9 1.9 KB 9647.2
5 0.063 888 26.1 604.0 B 45 311.0
The whole genome 0.032 712 115.1 6.5 KB 18 132.7
  • The verified sequence percentage of each chromosome, the size of raw sequence file and compressed file, as well as the compression rate are shown.


http://nar.oxfordjournals.org/content/early/2011/01/25/nar.gkr009.full

2011年6月13日星期一

unix - copy

From: http://en.wikipedia.org/wiki/Cp_%28Unix%29

Examples

To make a copy of a file in the current directory, enter:
cp prog.c prog.bak
This copies prog.c to prog.bak. If the prog.bak file does not already exist, the cp command creates it. If it does exist, the cp command replaces its contents with the contents of the prog.c file.

To copy a file in your current directory into another directory, enter:
cp jones /home/nick/clients
This copies the jones file to /home/nick/clients/jones.

To copy a file to a new file and preserve the modification date, time, and access control list associated with the source file, enter:
cp -p smith smith.jr
This copies the smith file to the smith.jr file. Instead of creating the file with the current date and time stamp, the system gives the smith.jr file the same date and time as the smith file. The smith.jr file also inherits the smith file's access control protection.

To copy all the files in a directory to a new directory, enter:
cp /home/janet/clients/* /home/nick/customers
This copies only the files in the clients directory to the customers directory.

To copy a directory, including all its files and subdirectories, to another directory, enter:
cp -R /home/nick/clients /home/nick/customers
This copies the clients directory, including all its files, subdirectories, and the files in those subdirectories, to the customers/clients directory. Be careful about including a trailing slash in the source directory, however. If you run cp -R /home/nick/clients/ /home/nick/customers on a GNU-based system, it does the same thing as without the slash; however, if you run the same thing on a BSD-based system, it will copy all the contents of the "clients" directory over, instead of the "clients" directory itself.

To copy a specific set of files to another directory, enter:
cp jones lewis smith /home/nick/clients
This copies the jones, lewis, and smith files in your current working directory to the /home/nick/clients directory.

To use pattern-matching characters to copy files, enter:
cp programs/*.c .
This copies the files in the programs directory that end with .c to the current directory, signified by the single . (dot). You must type a space between the c and the final dot.
Copying a file to an existing file is done by opening the existing file in update mode which requires write access and results in the target file retaining the permissions it had originally.

unix - transpose a file

here, some scripts on transposing file, row to colums

http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash

I find this awk way is helpful to me.

#############################################
awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' file

2011年6月10日星期五

curl in Mac and wget in Linux

wget used in linux,

curl -O used in Mac. 

these two all for download file from Internet.

R - Video Tutorial for Spatial Statistics

http://www.fabioveronesi.net/rtutorial.html

R - colors

http://thedatamonkey.blogspot.com/search?updated-max=2011-06-09T15%3A18%3A00-04%3A00&max-results=15


R: Colors

"It ain't easy being green. ~ Kermit T. F.

Matt Blackwell at the SSSB has made it easy to access all the Craylola(tm) colors in R.

And in case you're not familiar with the way R handles color, here are a few resources:

* The best color chart for R.
* Color palettes in R (allows plotting a spectrum or coordinated palette of colors easily).

2011年6月9日星期四

R - Drop factor levels in a dataset

R: Drop factor levels in a dataset

R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist

# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!

The solution is simple: run factor() again:
x <- factor(x)
levels(x)

If you need to do this on many factors at once (as is the case with a data.frame containing several columns of factors), use drop.levels() from the gdata package:
x <- x[x!="antiquewhite1"]
df <- data.frame(a=x,b=x,c=x)
df <- drop.levels(df)

Now I'm going to quit monkeying around and get to sleep.

From:
http://thedatamonkey.blogspot.com/

Gene Network construction… web based tool

http://biostar.stackexchange.com/questions/8905/gene-network-construction-web-based-tool

10 R One Liners to Impress Your Friends

http://datadebrief.blogspot.com/2011/06/10-r-one-liners-to-impress-your-friends.html


Multiply Each Item in a List by 2



#lists
lapply(list(1:4),function(n){n*2})
# otherwise
(1:4)*2

Sum a List of Numbers


#lists
lapply(list(1:4),sum)
 
# otherwise
sum(unlist(list(1:4))) # or simply
sum(1:4)
Verify if Exists in a String


wordlist = c("lambda", "data", "plot", "statistics", "R")
tweet = c("R is an integrated suite of software facilities for data manipulation, calculation and graphical display")
wordlist[wordlist %in% (c(unlist(strsplit(tweet,' ', fixed=T))))]


Read in a File


readLines("data.file", n=-1)

Happy Birthday to You!


lapply((1:4),function(x){ paste(c("Happy Birthday to ", ifelse(x!=3, "you", "dearName")), sep="", collapse="")})


Filter list of numbers


n = c(49, 58, 76, 82, 88, 90); c(list(n[which(n<=60)]),list(n[which(n>60)]))
Fetch and Parse an XML web service
 

library('XML'); xmlParseDoc('http://search.twitter.com/search.atom?&q=R-Project', asText=F)

Find minimum (or maximum) in a List


# for lists 
lapply(list(c(14, 35, -7, 46, 98)), min, classes="numeric", how="replace") # otherwise
min(unlist(list(14, 35, -7, 46, 98)))# or simply
min(c(14, 35, -7, 46, 98))
max(c(14, 35, -7, 46, 98))

Parallel Processing


# copy from Section 4 An example doSMP session
library(doSMP); w <- startWorkers(workerCount = 4); registerDoSMP(w); foreach(i = 1:3) %dopar% sqrt(i)

Sieve of Eratosthenes


##ok, this one is a little cheating
library('spuRs'); primesieve(c(),2:50)

2011年6月7日星期二

libsequence and the softwares it supports

http://molpopgen.org/software/lseqsoftware.html

Software dealing with sequence analysis:
analysis - C++ software for evolutionary genetic analysis. This package also requires the GNU Scientific Library to be installed. Many linux distros provide GSL packages, and OS X users can install it using either the fink or darwinports projects, according to their preference. (I prefer darwinports, for what that's worth). Howver, to make life easier on yourself, I recommend that OS X users install the GSL directly from the source code available at the GSL homepage. The reason for this is that I have not modified the build system to be able to deal with lib directories other than /usr/local/lib and /usr/lib. The GSL is used to calculate chi-squared probabilities for the program MKtest. If you're not aware of it, the GSL is a C library for numeric computation, essentially a modern version of "Numerical Recipes in C".
There are manpages for several of the programs in the analysis package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):
  1. compute a "mini-DNAsp" for the Unix command-line
  2. gestimator, Ka/Ks by Comeron's method
  3. kimura80, to calculate divergence using Kimura's (1980) method
  4. polydNdS, to analyze silent and replacement polymorphism
  5. MKtest, to perform McDonald and Kreitman tests
  6. rsq, to summarize linkage disequilibrium in data
  7. descPoly, a program to output a qualitative summary of features of sequence polymorphism data
  8. sharedPoly, a program to calculate number of shared polymorphisms between 2 partitions of an alignment
sequtils -software for sequence manipulation.
manpages are available online for the following programs in the sequtils package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):
  1. clustalwtofasta
  2. revcom
  3. toLDhat
  4. trimallgaps
Software dealing with analysis of coalescent simulation:
msstats Reads in data from Hudson's coalescent simulation program ms and calculates several common summary statistcs. The output is a tab-delimited list of statistics, with a header line so that the file can be easily processed in R.
example usage: ms 50 10000 -t 20 | msstats
msff Applies a frequency filter to the output of Dick Hudson's coalescent simulation. Using the -m flag, it filters on the minor allele frequency. Use -d to filter on the derived allele frequency. The filtered data a printed to stdout. The frequency filter removes sites where the relevant frequency is less than or equal to the input value. Frequencies are input as decimals on the interval [0,1]. For example, to calculate LD-related statistic using my msld package, but filtering out sites where the minor allele frequency is less than or equal to 10% in the sample:
ms 10 10000 -t 20 -r 20 1000 | msff -m 0.10 | msld > out.
rhothetapost Estimate mutation and recombination rates from multilocus polymorphism data. Described in Haddrill et al. (2005) and Thornton and Andolfatto (2006). Documentation is here
omega Calculates Kim and Nielsen's (2004, Genetics 167:1513) "omega_max" statistic which was explored in Jensen et al. (2007, Genetics 176 2371-3279). Please read the source code for documentation. Both Kim and Nielsen and Jensen et al. should be cited if this code is used--the first for the statistic, the latter for the implementation.

2011年6月6日星期一

climate and biomes of the world

(1) Climatic Zone and soil type layers for GIS:
http://eusoils.jrc.ec.europa.eu/projects/RenewableEnergy/

(2) http://www.blueplanetbiomes.org/
Here, a good documentation of climate and biomes of our planet, Earth.

2011年6月5日星期日

Omics data and pathway - tools

Integrating Omics data for signaling pathways, interactome reconstruction, and functional analysis

http://gettinggeneticsdone.blogspot.com/2011/06/resources-for-pathway-analysis.html

 

ssh

一 前言
关于 ssh 的好处, 相信不用我多说了吧?
简而言之, 之前的 rpc command 与 telnet 都全可用 ssh 代替.
比方如下的这些常见功能:
- 远程登录
ssh user@remote.machine
- 远程执行
ssh user@remote.machine 'command ...'
- 远程复制
scp user@remote.machine:/remote/path /local/path
scp /local/path user@remote.machine:/remote/path
- X forward
ssh -X user@remote.machine
xcommand ...
- Tunnel / Portforward
ssh -L 1234:remote.machine:4321 user@remote.machine
ssh -R 1234:local.machine:4321 user@remote.machine
ssh -L 1234:other.machine:4321 user@remote.machine
 

二, 实作
 
1) 禁止 root 登录
# vi /etc/ssh/sshd_config
PermitRootLogin no
2) 废除密码登录, 强迫使用 RSA 验证(假设 ssh 账户为 user1 )
# vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile   .ssh/authorized_keys
PasswordAuthentication no
# service sshd restart
# su - user1
$ mkdir ~/.ssh 2>/dev/null
$ chmod 700 ~/.ssh
$ touch ~/.ssh/authorized_keys
$ chmod 644 ~/.ssh/authorized_keys
--------------------------------------------------
转往 client 端:
$ ssh-keygen -t rsa
(按三下 enter 完成﹔不需设密码,除非您会用 ssh-agent 。)
$ scp ~/.ssh/id_rsa.pub user1@server.machine:id_rsa.pub
(若是 windows client, 可用 puttygen.exe 产生 public key,
然后复制到 server 端后修改之, 使其内容成为单一一行.)
---------------------------------------------------
回到 server 端:
$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
$ rm ~/id_rsa.pub
$ exit
3) 限制 su / sudo 名单:
# vi /etc/pam.d/su
auth required /lib/security/$ISA/pam_wheel.so use_uid
# visudo
%wheel ALL=(ALL)   ALL
# gpasswd -a user1 wheel
4) 限制 ssh 使用者名单
# vi /etc/pam.d/sshd
auth     required   pam_listfile.so item=user sense=allow file=/etc/ssh_users onerr=fail
# echo user1 >> /etc/ssh_users
5) 封锁 ssh 联机并改用 web 控管清单
# iptables -I INPUT -p tcp --dport 22 -j DROP
# mkdir /var/www/html/ssh_open
# cat > /var/www/html/ssh_open/.htaccess < AuthName "ssh_open"
AuthUserFile /var/www/html/ssh_open/.htpasswd
AuthType basic
require valid-user
END
# htpasswd -c /var/www/html/ssh_open/.htpasswd user1
(最好还将 SSL 设起来, 或只限 https 联机更佳, 我这里略过 SSL 设定, 请读者自补.)
(如需控制联机来源, 那请再补 Allow/Deny 项目, 也请读者自补.)
# cat > /var/www/html/ssh_open/ssh_open.php < //Set dir path for ip list
$dir_path=".";
//Set filename for ip list
$ip_list="ssh_open.txt";
//Get client ip
$user_ip=$_SERVER['REMOTE_ADDR'];
//allow specifying ip if needed
if (@$_GET['myip']) {
$user_ip=$_GET['myip'];
}
//checking IP format
if ($user_ip==long2ip(ip2long($user_ip))) {
//Put client ip to a file
if(@!($file = fopen("$dir_path/$ip_list","w+")))
{
    echo "Permission denied!!
";
    echo "Pls Check your rights to dir $dir_path or file $ip_list";
}
else
{
    fputs($file,"$user_ip");
    fclose($file);
    echo "client ip($user_ip) has put into $dir_path/$ip_list";
}
} else {
echo "Invalid IP format!!
ssh_open.txt was not changed.";
}
?>
END
# touch /var/www/html/ssh_open/ssh_open.txt
# chmod 640 /var/www/html/ssh_open5 * * * *   root   /etc/iptables/sshopen.sh clear
END
---------------------------
转往 client 端
在 browser URL 输入:
http://server.machine/ssh_open/ssh_open.php?myip=1.2.3.4
(若不指定 ?myip=1.2.3.4 则以 client 当时 IP 为准, 若没经 proxy 的话.)
如此, server 端的 ssh_open.txt 文件只有单一记录, 每次盖写.
接着:
$ telnet server.machine 1234
然后你有最多 5 分钟时间用 ssh 联机 server !
---------------------------
此步骤的基本构思如下:
5.1) 将 sshd 的 firewall 联机全部 block 掉.
5.2) 然后在 httpd 那设一个 directory, 可设 ssl+htpasswd+allow/deny control,
然后在目录内写一个 php 将 browser ip 记录于一份 .txt 文字文件里.
视你的转写能力, 你可自动抓取 browser 端的 IP, 也可让 browser 端传入参数来指定.
文字文件只有单一记录, 每次盖写.
5.3) 修改 /etc/services , 增加一个新项目(如 xxx), 并指定一个新 port(如 1234)
5.4) 再用 xinetd 监听该 port , 并启动令一只 script, 设定 iptables , 从 step2 的清单里取得 IP, 为之打开 ssh 联机.
5.5) 设 crontab 每数分中清理 iptables 关于 ssh 联机的规则. 这并不影响既有联机, 若逾时再连, 则重复上述.
 
6) 要是上一步骤没设定, 你或许会担心过多的人来 try 你的 ssh 服务的话:
# cat > /etc/iptables/sshblock.sh < #!/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
LOG_FILE=/var/log/secure
KEY_WORD="Illegal user"
KEY_WORD1="Failed password for root"
PERM_LIST=/etc/firewall/bad.list.perm
LIMIT=5
MAIL_TO=root
IPT_SAV="$(iptables-save)"
bad_list=$(egrep "$KEY_WORD" $LOG_FILE | awk '{print $NF}' | xargs)
bad_list1=$(egrep "$KEY_WORD1" $LOG_FILE | awk '{print $11}' | xargs)
bad_list="$bad_list $bad_list1"
for i in $(echo -e "${bad_list// /\n}" | sort -u)
do
    hit=$(echo $bad_list | egrep -o "$i" | wc -l)
    [ "$hit" -ge "$LIMIT" ] && {
          echo "$IPT_SAV" | grep -q "$i .*-j DROP" || {
              echo -e "\n$i was dropped on $(date)\n" | mail -s "DROP by ${0##*/}: $i" $MAIL_TO
              iptables -I INPUT -s $i -j DROP
          }
          egrep -q "^$i$" $PERM_LIST || echo $i >> $PERM_LIST
    }
done
END
# chmod +x /etc/firewall/sshblock.sh
# cat >> /etc/hosts.allow < sshd: ALL: spawn ( /etc/firewall/sshblock.sh )& : ALLOW
END
这样, 那些乱 try SSH 的家伙, 顶多能试 5 次(LIMIT 可调整), 然后就给 BLOCK 掉了.
此外, 在 PERM_LIST 的 ip, 也可提供给 iptables 的初始 script , 来个永久性封闭:
for i in $(< $PERM_LIST)
do
    /sbin/iptables -I INPUT -s $i -j DROP
done  
7) 还有, 你想知道有哪些人对你做 full range port scan 的话:
# iptables -I INPUT -p tcp --dport 79 -j ACCEPT
cat > /etc/xinetd.d/finger < service finger
{
    socket_type   = stream
    wait         = no
    user         = nobody
    server       = /usr/sbin/in.fingerd
    disable       = no
}
END
# cat >> /etc/hosts.allow < in.fingerd: ALL : spawn ( echo -e "\nWARNING %a was trying finger.\n$(date)" | mail -s "finger from %a" root ) & : DENY
END
这里, 我只是设为发信给 root.
事实上, 你可修改为起动 firewall 将 %a 这个传回值给 ban 掉也行.
不过, 对方要是有选择性的做 port scan , 没扫到 finger 的话, 那当然就没用了...
 
 

SSH客户端命令
Submitted by amxku on 2006, June 14, 11:35 PM. 帝国系统
ssh –l user –p 22 upsdn.net
输入密码即可登录
l login_name
指定登入于远程机器上的使用者,若没加这个选项,而直接打 ssh lost 也是可以的,它是以读者目前的使用者去做登入的动作。 例如: ssh –l root http://www.upsdn.net
===================================================
-c blowfish|3des
在期间内选择所加密的密码型式。预设是3des,3des(作三次的资料加密) 是用三种不同的密码键作三次的加密-解密-加密。 blowfish 是一个快速区块密码编制器,它比3des更安全以及更快速。
===================================================
-v
Verbose 模式。使ssh 去印出关于行程的除错讯息,这在连接除错,认 证和设定的问题上有很的帮助。
===================================================
-f
要求ssh 在背景执行命令,假如ssh要询问密码或通行证,但是使用者 想要它在幕后执行就可以用这个方式,最好还是加上-l user 例如在远程场所上激活 X11,有点像是 ssh –f host xterm 。
===================================================
-i identity_file
选择所读取的 RSA 认证识别的档案。预设是在使用者的家目录 中的 .ssh/identity
===================================================
-n
重 导 stdin 到 /dev/null (实际上是避免读取 stdin)。必须当 ssh 在幕后执行时才使用。常见的招数是使用这选项在远程机器上去执行 X11 的程序 例如,ssh -n shadows.cs.hut.fi emacs &,将在 shadows.cs.hut.fi 上激活 emace,并且 X11 连接将自动地在加密的信道上发送。ssh 程序将把它放 在幕后。(假如ssh需要去询问密码时,这将不会动作)
===================================================
-t
强制配置 pseudo-tty。这可以在远程机器上去执行任意的 screen-based 程 式,例如操作 menu services。
===================================================
-C
要 求压缩所有资料(包含 stdin, stdout,stderr 和 X11 和 TCP/IP 连接) 压缩演算规则与 gzip 相同,但是压缩的等级不能控制。在调制解调器或 联机速度很慢的地方,压缩是个很好的选择,但如果读者的网络速路很 快的话,速度反而会慢下来。
=====================================================
-p port
连接远程机器上的 port。 不用这个选项,默认就是22
======================================================
-P
使用非特定的 port 去对外联机。如果读者的防火墙不淮许从特定的 port去联机时,就可以使用这个选项。注意这个选项会关掉 RhostsAuthentication 和 RhostsRSAAuthentication。
=====================================================
-L listen-port:host:port
指派本地的 port 到达端机器地址上的 port。
====================================================
-R listen-port:host:port
指派远程上的 port 到本地地址上的 port。
-2 强制 ssh 去使用协议版本 2。
-4 强制 ssh 去使用 IPv4 地址。
-6 强制 ssh 去使用 IPv6 地址。
=====================================================
-g
允许远程主机去连接本地指派的 ports。
-a
关闭认证代理联机。
-e character
设定跳脱字符
 
scp 使用 scp 在远程机器上 copy 档案
======================================================
copy 本地的档案到远程的机器上
scp /etc/lilo.conf my@www.upsdn.net:/home/my
会将本地的 /etc/lilo.conf 这个档案 copy 到 www.upsdn.net,使用者my 的家目录下。
=====================================================
copy远程机器上的档案到本地来
会将 http://www.upsdn.net 中 /etc/lilo.conf 档案 copy 到本地的 /etc 目录下。
=====================================================
保持从来源 host 档案的属性
ssh-keygen
产生公开钥 (pulib key) 和私人钥 (private key),以保障 ssh 联机的安性.
当 ssh 连 shd 服务器,会交换公开钥上,系统会检查 /etc/ssh_know_hosts 内储存的 key,如果找到客户端就用这个 key 产生一个随机产生的session key 传给服务器,两端都用这个 key 来继续完成 ssh 剩下来的阶段。
它 会 产生 identity.pub、identity 两个档案,私人钥存放于identity,公开钥 存放于 identity.pub 中,接下来使用 scp 将 identity.pub copy 到远程机器的家目录下.ssh下的authorized_keys。 .ssh/authorized_keys(这个 authorized_keys 档案相当于协议的 rhosts 档案),之后使用者能够不用密码去登入。RSA的认证绝对是比 rhosts 认证更来的安全可靠。
执行:
若在使用 ssh-keygen 产生钥匙对时没有输入密码,则如上所示不需输入密码即可从 http://www.upsdn.net去登入 sohu.com。
在此,这里输入的密码可以跟帐号的密码不同,也可以不输入密码。
 
SSH protocol version 1:
每一部主机都可以使用 RSA 加密方式来产生一个 1024-bit 的 RSA Key ,这个 RSA 的加密方式,主要就是用来产生公钥与私钥的演算方法!这个 version 1 的整个联机的加密步骤可以简单的这么看:
1. 当每次 SSH daemon (sshd) 激活时,就会产生一支 768-bit 的公钥(或称为 server key)存放在 Server 中;
2. 若有 client 端的需求传送来时,那么 Server 就会将这一支公钥传给 client ,而 Client 藉由比对本身的 RSA 加密方式来确认这一支公钥;
3. 在 Client 接受这个 768-bit 的 server key 之后,Client 自己也会随机产生一支 256-bit 的私钥(host key),并且以加密的方式将 server key 与 host key 整合成一支完整的 Key ,并且将这支 Key 也传送给 server ;
4. 之后,Server 与 Client 在这次的联机当中,就以这一支 1024-bit 的 Key 来进行资料的传递!
当然啦,因为 Client 端每次的 256-bit 的 Key 是随机取的,所以你这次的联机与下次的联机的 Key 就会不一样啦!
 
==============================================
SSH protocol version 2:
与 version 1 不同的是,在 version 2 当中将不再产生 server key 了,所以,当 Client 端联机到 Server 端时,两者将藉由 Diffie-Hellman key 的演算方式来产生一个分享的 Key ,之后两者将藉由类似 Blowfish 的演算方式进行同步解密的动作!
每一个 sshd 都提供这两个版本的联机,而决定这两种模式联机的,就必需要在 client 端联机时选择联机的模式才能确认。目前预设情况下,会自动使用 version 2 的联机模式喔!而由于我们的联机资料中,经过了这个 Public 与 Private Key 的加密、解密动作,所以在中间的传送过程中,当然就比较安全的多啰!
如果直接以 ssh hostname 来连接进入 hostname 这个主机时,则进入 hostname 这个主机的『帐号名称』将会是目前你所在的这个环境当中的使用者帐号!以上面为例,因为我是以 root 的身份在执行,所以如果我执行了『 ssh host.domain.name 』时,那么对方 host.domain.name 这部主机,就会以 root 的身份来让我进行密码确认的登入动作!因此,为了避免这样的麻烦,通常我都是以简单的 e-mail 的写法来登入远方的主机,例如『ssh user@hostname 』即表示,我是以 user 这个帐号去登入 hostname 这部主机的意思。当然,也可以使用 -l username 这样的形式来书写!登入对方主机之后,其它的所有执行行为都跟在 Linux 主机内没有两样~所以,真的是很简单吧! ^_^ 这样就可以达到远程控管主机的目的了!此外,在预设的情况下, SSH 是『允许你以 root 的身份登入』喔!呵呵!更是爽快啦!此外,请特别留意的是,当您要连接到对方的主机时,如果是首次连接,那么 Server 会问你,你的联机的 Key 尚未被建立,要不要接受 Server 传来的 Key ,并建立起联机呢?呵呵!这个时候请『务必要输入 yes 而不是 y 或 Y』,这样程序才会接受
sftp -l username hostname 或者 sftp user@hosname
进入到 sftp 之后,那就跟在一般 FTP 模式下的操作方法没有两样
cd
ls dir
mkdir
rmdir
pwd
chgrp
chown
chmod
ln oldname newname
rm path
rename oldname newname
exi

vi and vim

vi 编辑器是最常用的文档创建和编辑工具,初学者应该学会简单应用vi ,学会在vi 中做简单的修改、删除、插入、搜索及替换作业;如果您是新手,不妨看看本文,或许这篇文档能让您在最短的时间内学会vi的简单操作;
目录

+++++++++++++++++++++++++++++++++++++
正文
+++++++++++++++++++++++++++++++++++++


1、关于文本编辑器;

文本编辑器有很多,比如图形模式的gedit、kwrite、OpenOffice ... ... ,文本模式下的编辑器有vi、vim(vi的增强版本)和nano ... ... vi和vim是我们在Linux中最常用的编辑器。我们有必要介绍一下vi(vim)最简单的用法,以让Linux入门级用户在最短的时间内学会使用它。
nano 工具和DOS操作系统下的edit操作相似,使用简单,我们不作介绍了,如果您有兴趣,不妨尝试一下;

2、vi 编辑器;

为什么要学会简单应用vi
vi或vim是Linux最基本的文本编辑工具,vi或vim虽然没有图形界面编辑器那样点鼠标的简单操作,但vi编辑器在系统管理、服务器管理中,永远 不是图形界面的编辑器能比的。当您没有安装X-windows桌面环境或桌面环境崩溃时,我们仍需要字符模式下的编辑器vi;
vi或vim 编辑器在创建和编辑简单文档最高效的工具;

3、vi 编辑器的使用方法;


3.1 如何调用vi ;

[root@localhost ~]# vi  filename
~
~
~
~
~
~
~
~

3.2 vi 的三种命令模式;

Command(命令)模式,用于输入命令;
Insert(插入)模式,用于插入文本;
Visual(可视)模式,用于视化的的高亮并选定正文;

3.3 文件的保存和退出;

Command 模式是vi或vim的默认模式,如果我们处于其它命令模式时,要通过ESC键切换过来。
当我们按ESC键后,接着再输入:号时,vi会在屏幕的最下方等待我们输入命令;
:w  保存;
:w  filename 另存为filename;
:wq! 保存退出;
:wq! filename 注:以filename为文件名保存后退出;
:q! 不保存退出;
:x 应该是保存并退出 ,功能和:wq!相同

3.4 光标移动;

当我们按ESC进入Command模式后,我们可以用下面的一些键位来移动光标;
j 向下移动一行;
k 向上移动一行;

h 向左移动一个字符;
l 向右移动一个字符;

ctrl+b  向上移动一屏;
ctrl+f  向下移动一屏;

向上箭头    向上移动;
向下箭头    向下移动;
向左箭头    向左移动;
向右箭头    向右移动;
我们编辑一个文件时,对于 j、k、l和h键,还能在这些动作命令的前面加上数字,比如 3j,表示向下移动3行。

3.5 插入模式(文本的插入);

在光标之前插入;
在光标之后插入;

I 在光标所在行的行首插入;
在光标所在行的行末插入;

在光标所在的行的上面插入一行;
在光标所在的行的下面插入一行;

删除光标后的一个字符,然后进入插入模式;
删除光标所在的行,然后进入插入模式;

3.6 文本内容的删除操作;

x 一个字符;
#x 删除几个字符,#表示数字,比如3x;
dw 删除一个单词;
#dw 删除几个单词,#用数字表示,比如3dw表示删除三个单词;
dd 删除一行;
#dd 删除多个行,#代表数字,比如3dd 表示删除光标行及光标的下两行;
d$ 删除光标到行尾的内容;

J 清除光标所处的行与上一行之间的空格,把光标行和上一行接在一起;

3.7 恢复修改及恢复删除操作;

撤消修改或删除操作;
按ESC键返回Command(命令)模式,然后按u键来撤消删除以前的删除或修改;如果您想撤消多个以前的修改或删除操作,请按多按几次u。这和Word的撤消操作没有太大的区别;

3.8 可视模式;

在最新的Linux发行版本中,vi提供了可视模式,因为这个功能是vim才有的。如果您用的vi没有这个功能,就换成vim就有了。打开可视模式,按ESC键,然后按v就进入可视模式;
可视模式为我们提供了极为友好的选取文本范围,以高亮显示;在屏幕的最下方显示有;
-- 可视 --  



--VISUAL--
如图:
Image
进入可视模式,我们就可以用前面所说的命令行模式中的光标移动指令,可以进行文本范围的选取。
选取文本范围有何用?
我们可以对某部份删除作业,按d键就删除了我们选中的内容。
选中内容后,我们按y就表示复制;按d表示删除;
值得一提是的删除的同时,也表示复制。我们返回到命令模式,然后移动光标到某个位置,然后按shift+p键,就把刚才删除的内容贴上了。我们先在这里提一句,在后文,我们还得详细说说。
退出可视模式,还是用ESC键;

3.9 复制和粘帖的操作;

其实删除也带有剪切的意思,当我们删除文字时,可以把光标移动到某处,然后按shift+p键就把内容贴在原处,然后再移动光标到某处,然后再按p或shift+p又能贴上;
p 在光标之后粘帖;
shift+p 在光标之前粘帖
来举一例:
比如我们想把一个文档的第三行复制下来,然后帖到第五行的后面,我们应该怎么做呢?
有两种方法;
第一种方法:
先把第三行删除,把光标移动到第三行处,然后用dd动作,接着再按一下shift+p键。这样就把刚才删除的第三行帖在原处了。
接着我们再用k键移动光标到第五行,然后再按一下p键,这样就把第三行的内容又帖到第五行的后面了;
第二种方法;
进入可视模式,按ESC键,然后按v键。移动鼠标指针,选中第三行的内容,然后按y键复制;再移动指针到第五行,最后按p键;
所以复制和粘贴操作,是命令模式、插入模式及可视模式的综合运用;我们要学会各种模式之间的切换,要常用ESC键;更为重要的学会在命令模式下移动光标;

3.10 关于行号;

有时我们配置一个程序运行时,会出现配置文件X行出现错误 。这时我们要用到行号相关的操作;

为所有内容添加行号;

按ESC键,然后输入:
:set number
光标所处位置
在屏幕的右下角,有类似如下的;
         57,8          27%
在这之中,57表示第57行,8表示第8个字符;

3.11 查找和替换功能;


3.11.1 查找;

首先,我们要进入ESC键,进入命令模式;我们输入/或?就进入查找模式了;
/SEARCH  注:正向查找,按n键把光标移动到下一个符合条件的地方;
?SEARCH  注:反向查找,按shift+n 键,把光标移动到下一个符合条件的
举一例:比如我想在一个文件中找到swap单词,我应该如下做;
首先按ESC键,进入命令模式,然后输入;
/swap

?swap

3.11.2 替换;

按ESC键进入命令模式;
:s /SEARCH/REPLACE/g  注:把当前光标所处的行中的SEARCH单词,替换成REPLACE,并把所有SEARCH高亮显示;
:%s /SEARCH/REPLACE  注:把文档中所有SEARCH替换成REPLACE;
:#,# s /SEARCH/REPLACE/g  注:#号表示数字,表示从多少行到多少行,把SEARCH替换成REPLACE;
注:在这之中,g表示全局查找;我们注意到,就是没有替换的地方,也会把SEARCH高亮显示;
举例说明:
比如我们有一篇文档要修改;
我们把光标所在的行,把所有单词the,替换成THE,应该是:
:s /the/THE/g
我们把整篇文档的所有的the都替换成THE,应该是:
:%s /the/THE
我们仅仅是把第1行到第10行中的the,替换成THE,应该是;
:1,10  s /the/THE/g

4、关于本文;

我写本文的目的是让新手在最短的时间内用vi或vim创建、编辑和修改文件,所以说这篇文档并不是大而全的vi手册。如果把vi所有的功能都说全了,至少得写一本千页的手册;本也没有涉及更为高级的vi用法。如果想了解的更多,请查找man和help;

5、后记;

到目前为止,关于目录和文件的操作,我写过有几篇了,从文件和目录的创建、删除、复制到属性操作,最后到文件修改等系列文档。这些文档都是有相关性的,如果把这些相关性的文档连起来,就是一个整体知识块。我们只有掌握了这些知识,才能实现文件系统的管理。
在以后我会写什么内容呢????可能会补充一下文件的查找,其实以前有类似文档,到时我总结一下,贴出来给大家看看就行了。
下一步是准备网络基础文档建设中,网络基础比较重要,这是我计划的下一个重点。。。。。。。。

6、参考文档;

man vi 和vi --help

Distance-based methods for the analysis of maps produced by species distribution models

http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00115.x/abstract

calculate different distances between maps produced by SDM, and apply PCA on that comparison.

Bayesian analysis for species distribution modelling

http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00077.x/full

Fine-scale environmental variation in species distribution modelling: regression dilution, latent variables and neighbourly advice

Genome Browser, Blat, and liftOver source - from UCSC Genomic Bioinformatics

UCSC Genomic Bioinformatics
http://hgdownload.cse.ucsc.edu/downloads.html

Source Downloads

  The Genome Browser, Blat, and liftOver source are freely downloadable for academic, noncommercial, and personal use. For information on commercial licensing, see the Genome Browser and Blat licensing requirements.

2011年6月4日星期六

Guidelines for estimating repeatability - ICC R package

http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00125.x/abstract

有效样本大小能否支持你得到的统计值?如何计算出你期望的有效样本大小呢?

1. Researchers frequently take repeated measurements of individuals in a sample with the goal of quantifying the proportion of the total variation that can be attributed to variation among individuals vs. variation among measurements within individuals. The proportion of the variation attributed to variation among individuals is known as repeatability and is most frequently estimated as the intraclass correlation coefficient (ICC). The goal of our study is to provide guidelines for determining the sample size (number of individuals and number of measurements per individual) required to accurately estimate the ICC.
2. We report a range of ICCs from the literature and estimate 95% confidence intervals for these estimates. We introduce a predictive equation derived by Bonett (2002), and we test the assumptions of this equation through simulation. Finally, we create an R statistical package for the planning of experiments and estimation of ICCs.
3.  Repeatability estimates were reported in 1·5% of the articles published in the journals surveyed. Repeatabilities tended to be highest when the ICC was used to estimate measurement error and lowest when it was used to estimate repeatability of behavioural and physiological traits. Few authors report confidence intervals, but our estimated 95% confidence intervals for published ICCs generally indicated a low level of precision associated with these estimates. This survey demonstrates the need for a protocol to estimate repeatability.
4.  Analysis of the predictions from Bonett’s equation over a range of sample sizes, expected repeatabilities and desired confidence interval widths yields both analytical and intuitive guidelines for designing experiments to estimate repeatability. However, we find a tendency for the confidence interval to be underestimated by the equation when ICCs are high and overestimated when ICCs and the number of measurements per individual are low.
5. The sample size to use when estimating repeatability is a question pitting investigator effort against expected precision of the estimate. We offer guidelines that apply over a wide variety of ecological and evolutionary studies estimating repeatability, measurement error or heritability. Additionally, we provide the R package, icc, to facilitate analyses and determine the most economic use of resources when planning experiments to estimate repeatability.

investigate cryptic evolution

Cryptic Evolution: Does Environmental Deterioration Have a Genetic Basis

http://www.genetics.org/content/187/4/1099.full#xref-ref-52-1

  1. Jarrod D. Hadfield 1 ,
  2. Alastair J. Wilson and
  3. Loeske E. B. Kruuk 

 

2011年6月1日星期三

simple unix command - uncompress files

 #######################
tar is a compression technology used to create a Tape ARchive. The resulting file is known as a tarball.

If you have Window, this is the same as a Zip file. You use winzip to compress and uncompress .zip files.So its the same idea. To uncompress the files (or to get the files out of a tarball), you can use the following commands in linux.

tar xvf filename.tar

If the tarball has also been gzipped (compressed), you can use the following command:
tar xvfz filename.tar.gz

If you only want certain directories from the tarball, do this:
tar xvzf filename.tar.gz */dir.you.want/*

If you have a .tar.bz2 file, then you need bzip2 installed (/usr/ports/archivers/bzip2), and you issue this command:
tar yxf filename.tar.bz2

################
gunzip file.tar.gz
tar -xvf file.tar

Alternatively you can use one single command to do all the work:

tar -zxvf file.tar.gz

The -z basically does the gunzip work for you. Hope that helps.

http://www.ozzu.com/unix-linux-forum/how-unix-command-line-uncompress-tar-t69733.html