How I setup working environment for my daily work
As a fulltime computational biologist, I spend almost all my time in front of my desktop to do my job. And as one of the earlist PhD students of PICB, I was always asked for advices from fresh students for the setting up of the working environment.
So here I have some assumptions about the conditions we face:
- One or several servers, and even cluster, but unfortunately, you don’t have root privilege of them;
- One desktop, Linux or Window, with root privilege;
- A laptop, Linux or Windows, with root privilege.
In real world, what I have are:
- Some clusters, some servers, without root privilege;
- One Dell desktop, running win8.1, i7 CPU with 256G SSD;
- One Lenoveo X201i laptop, running win8.1, i3 CPU with 256G SSD.
Applications work on both of Windows and Linux
- JabRef
- Dropbox
- Copy
- Atom and sublime text. For the editors, I useAtom, an editor developped by GitHub, with very good support to Markdown. Another favor of mine is Sublime Text, but it is commercial.
- R and RStudio.
And RStudio could be installed from the pre-built binary files downloaded from official websites.
Windows desktop
There is always an issue to work in the windows environment as a bioinformatician, as Linux is dominant in the scientific computation, but generally people need windows as it is the dominant OS in desktop. I list my solutions here, but the final solution is a Linux VM or Linux Desktop.
- PuTTY is the mostly widely used SSH and Telnet implementation of Windows. I use PuTTYTray, an improved version of PuTTy.
- I don’t like the original default display settings, it is ugly when you use ncureses application like
mc
, so Tango Themes was used. To use it:- Import the
putty tango theme.reg
file into registery of windows - When set up a new PuTTY session, firstly load
tango theme
, then edit the settings ofHost Name
,port
and things you want to set, change the name, and save the new named session.
- Import the
- For the file tranferring, I use WinSCP, it is a client for FTP, SFTP, and with Total commander-like UI.
- Except Atom and Sublime Text, notepad++ is also good.
As the final solution, a Linux VM or Linux Desktop is necessary for a real bioinformatician. Both Virtual Box and VMPlayer are free for non-commercial usage. I use VMPlayer as sometimes I could not install a 64bit Linux guest on a 64bit Win8 host.
Linux VM or Desktop
Currently I run Ubuntu 14.04 LTS 64-bit on VMPlayer. Many softwares could be installed by PPA source of Ubuntu. To save time, we may just add repository first, and then just run apt-get update
only once.
- Midnight Commander is my favorite file manager, a clone of Norton Commander. Under Ubutu, it could be installed from software center directly. In the menu (press
F9
),Options
=>Panel options
=> checkLynx-like motion
, it will allow us to change the folder by arrow keys. - To use PPA to install R in Ubuntu:
sudo add-apt-repository ppa:marutter/rrutter sudo apt-get update sudo apt-get install r-base r-base-dev
- To install Flash:
sudo add-apt-repository ppa:nilarimogard/webupd8 sudo apt-get update sudo apt-get install freshplayerplugin
- To install Java of Oracle:
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer sudo apt-get install oracle-java8-set-default
- I love synapse launcher, a launcher for Linux. to insall it:
sudo apt-add-repository ppa:synapse-core/testing
sudo apt-get update
sudo apt-get install synapse
- Atom as the editor. Source(from webupd8).
sudo add-apt-repository ppa:webupd8team/atom sudo apt-get update sudo apt-get install atom
- git.
- R and RStudio. And to install Shiny server:
sudo su - \ -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\"" sudo apt-get install gdebi-core wget http://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.2.1.362-amd64.deb sudo gdebi shiny-server-1.2.1.362-amd64.deb
To save space of the
/home
folder (such as you use limited space SSD), we may setup R_LIBS at other folder (as remote file server). In such case, we need to do somethings for~/.bash_profile
(for R console) and~/.Renviron
.
In ~/.bash_profile
, we add:
export R_LIBS=/media/your_remote_fileserver/RLIBS
And in ~/.Renviron
, we add:
R_LIBS=/media/your_remote_fileserver/RLIBS
Now all R libraries will firstly installed at remote server, and save a lot space for your home folder!
Connection between Linux desktop and Linux remote servers
- Generate SSH key of local desktop for remote connection.
> ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ny.shao/.ssh/id_rsa):
Created directory '/home/ny.shao/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ny.shao/.ssh/id_rsa.
Your public key has been saved in /home/ny.shao/.ssh/id_rsa.pub.
The key fingerprint is:
xxxxxxxxxxxxxxxxxxxxx ny.shao@ubuntu
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| |
+-----------------+
Once your desktop has the key, you can just use it in the future, don’t need to run it again.
- Copy your desktop key to remote servers.
ssh-copy-id user@remote-server.edu
If your remote server use different port, saying port
123456
:ssh-copy-id -p 123456 user@remote-server.edu
- Touch a config file under
.ssh
of the local Linux desktop, and edit it.cd ~/.ssh touch config
Then open the config file with editor you prefer, add cotent like this:
Host *
ServerAliveInterval 120
Host your_server_ssh
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_p_%r
HostName your.ssh.server.edu
User user
RemoteForward 52698 localhost:52698
Port you_port
For the first section, I set a “heartbeat” for all ssh connections, and it will send a heartbeat every 120 seconds to keep the ssh connection alive.
In the next section, user
is the user name, and you_port
is the specific port for ssh if the server set. If default 22 port used, then line Port
could be skipped. RemoteForward
of port 52698 is set for rmate
for the local Atom
or Sublime text
editor, you may modify it, but don’t forget to change the port number in rmate
then.
If the company or school has a “gate” node to login, you may follow this:
Host gate
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_p_%r
HostName gate.your_school.edu
User user
Host your_sever_ssh
ProxyCommand ssh -q gate nc -q0 server1.your_school.edu 22
User user
RemoteForward 52698 localhost:52698
Then if you want ssh login your server, you could:
ssh -XC your_server_ssh
-
Install remote-atom package in Atom on local Linux desktop:
Ctrl+Shift+P
=>Install Packages
. For Sublime Text, packagersub
need to be installed. - Put
rmate
in some place in $PATH, generally I put it in~/opt/app/rmate
.wget --no-check-certificate -O ~/opt/app/rmate/rmate https://raw.github.com/aurora/rmate/master/rmate chmod +x rmate
Later I will edit
.bashrc
to add paths under~/opt/app
automatically. - Now you may try:
rmate test.txt
Then you should find test.txt in your local atom window. If you get:
connect_to localhost port 52698: failed.
Then you need to start atome rmate server: in Atom,
Packages
=>Remote Atom
=>Start Server
.
Linux server setup
- Setup at login. I prefer use
bash
, but.bashrc
and.bash_profile
are always an issue. To save time, I just put these at the end of my.bash_profile
:if [ -f ~/.bashrc ]; then source ~/.bashrc fi
Then this snippet was added:
export HOME2="$HOME/opt"
if [ -d ${HOME2}/app ];
then
for j in $( ls ${HOME2}/app );
do
if [ -d ${HOME2}/app/${j}/bin ];
then
export PATH="${HOME2}/app/${j}/bin:$PATH"
else
if [ -d ${HOME2}/app/${j} ];
then
export PATH="${HOME2}/app/${j}:$PATH"
fi
fi
done
fi
So all subfolders under ~/opt/app
or ~/opt/app/*/bin
will be added to $PATH
.
export R_LIBS=${HOME2}/Rpack
I also move my path of R packages to ~/opt/Rpack
to make it easy to be tracked.
- For the third party softwares, I prefer to use pkgsrc as a ready management system of software packages. It could help you install thousands softwares without root privilege, like
apt-get
in Debian/Ubuntu. Download the newest version, uncompress it, saying in the folder~/opt/tmp/pkg_source_installation_date
, and run:
./bootstrap --prefix ~/opt/pkg_installation_date --unprivileged
After it, add ~/opt/pkg_installation_date/bin
to your $PATH
, that’s it!
When you want to add some software, saying tmux
, just go to ~/opt/tmp/pkg_source_installation_date/misc/tmux
, then:
bmake
bmake install
The softwares I suggest to install are: htop
, tmux
, midnight commander
, eog
, xpdf
, emacs
. When you want to know if a software is in repo, just search it in pkgsrc.se.
- Python. There are many way to solve python issue when you want to install a python package but you don’t have root.
pip install --user your_package
. But I don’t recommend it.Virtualenv
. It is powerful, but I don’t use it.Enthought Canopy
is a pre-built python bundle, could be installed under your home folder, and it is free for academic users. But I don’t use it now.Anaconda
. Another pre-built python bundle, and I am using it now.
I use
Anaconda
orCanopy
because they integerate many math packages, such asnumpy
,scipy
, andmatplotlib
. If you want to install them manually, it will be tedious and trouble. - Perl. I use
Active Perl
.
Misc
- For R, some packages are so useful, I generally install them everywhere.
install.packages(c("ggplot2", "reshape2", "plyr", "stringr", "sqldf", "data.table", "doMC", "caTools"), dep=TRUE) source("http://bioconductor.org/biocLite.R") biocLite("BiocInstaller") biocLite(c("BSgenome", "Rsamtools", "ShortRead", "GenomicRanges", "DESeq2", "DEXSeq", "edgeR"))
- If you get error in installation of
RCurl
andXML
in R: In ubuntu:sudo apt-get install libcurl4-openssl-dev sudo apt-get install libxml2-dev
In CentOS:
sudo yum -y install curl sudo yum -y install libcurl libcurl-devel sudo yum -y install libxml2 libxml2-devel
- Some reference:
- For CRAN packages, please refer this post.
- For Next Generation Sequencing data analysis, please read this guide on Bioconductor.
- Somebody prefer to edit their
~/.Rprofile
to customize there R environment, likeoptions(stringsAsFactors=FALSE)
(I do hate this stupid default option!), but it is dangrous for the portability of your code. - Discussions about
Anaconda
,Canopy
, and manually compiled python, could be found here.