About sorting big BED files
It is a frequencing real world problem about sorting a big BED file.
There are many ways to do it. And the solutions I prefer are bedtools and GNU sort.
bedtools provides sortBed command for the sorting.
Things like this:
sortBed -i A.bed > A_sorted.bed
But sortBed of bedtools needs big memory of the server, so I also somtimes use GNU sort. sort utility in GNU coreutils now supports parallel computing and large cache.
Because generally our Linux servers are old RHEL or somethings old, so I generally use pkgsrc to install the relatively new version, in pkgsrc_source/sysutils/coreutils. And so on the new command is named as gsort to distiguish with sort in system $PATH.
So a typical command is:
gsort --parallel=16 -S 20G -k1,1 -k2,2n -k6,6 A.bed > A_sorted.bed
Here 16 are the cores used in sorting, and 20G is the cache size. gsort will sort the chromosome names firstly then the start position, then the strands.