【5.4.2】codonw
一、简介
2005的一个工具,支持Linux本地化
http://codonw.sourceforge.net/index.html
Calculates the codon usage indices
- CAI: Codon adaptation index
- Fop: Frequency of optimal codons
- Nc: Effective number of codons
- CBI: Codon Bias Index
Calculates amino acid indices
- GRAVY score
- Aromaticity
Calculates correspondence analysis of
- Codon Usage
- RSCU (Relative synonymous codon usage)
- Amino acid usage
Correspondence analysis
- can include/exclude codons/amino acids
- can generate detailed reports of trends
- attempts to identify optimal codons automatically
- can allow additional data sets to be added records any number of trends
Calculates gene parameters
- Gene length
- GC, GC3s and codon position specific G+C
- Dinucleotide composition (in all three codon frames)
- Amino acid usage
- Relative amino acid usage
- Codon usage
- Relative Synonymous codon usage
二、安装
cd /data/user/sam/project/codon_optimization/lib
https://sourceforge.net/projects/codonw/files/codonw/SourceCode-1.4.4%28zip%29/CodonWSourceCode_1_4_4.zip
unzip CodonWSourceCode_1_4_4.zip
cd codonw
三、用法
寻求帮助:
codonw -help
codonw
生成序列的codon usage
codonw input.dat -nomenu input.out input.blk
默认情况下,codonw会将每个基因的密码子使用情况报告给文件input.blk。 由于此数据集没有问题,因此不应有警告消息。 但是,基于EMBL版本50对该数据集的先前版本进行分析,其中SCCHRIII具有230个带注释的ORF,会生成这些典型的警告消息。
Warning: Sequence 178 "SCCHRIII.PE178______" does not begin with a recognised start codon
Warning: Sequence 178 "SCCHRIII.PE178______" is not terminated by a stop codon
Warning: Sequence 202 "SCCHRIII.PE202______" does not begin with a recognised start codon
Warning: Sequence 202 "SCCHRIII.PE202______" has 1 internal stop codon(s)
Warning: Sequence 202 "SCCHRIII.PE202______" is not terminated by a stop codon
input.dat 是输入序列文件,地址:http://codonw.sourceforge.net/input.dat
input.blk结果示例:
Phe UUU 27 1.38 Ser UCU 4 0.71 Tyr UAU 6 1.00 Cys UGU 4 1.14
UUC 12 0.62 UCC 8 1.41 UAC 6 1.00 UGC 3 0.86
Leu UUA 12 1.24 UCA 7 1.24 TER UAA 0 0.00 TER UGA 0 0.00
UUG 9 0.93 UCG 4 0.71 UAG 1 3.00 Trp UGG 5 1.00
CUU 19 1.97 Pro CCU 5 1.43 His CAU 3 1.00 Arg CGU 3 1.50
CUC 7 0.72 CCC 0 0.00 CAC 3 1.00 CGC 0 0.00
CUA 9 0.93 CCA 7 2.00 Gln CAA 11 1.69 CGA 1 0.50
CUG 2 0.21 CCG 2 0.57 CAG 2 0.31 CGG 0 0.00
Ile AUU 19 1.33 Thr ACU 5 0.71 Asn AAU 12 1.20 Ser AGU 8 1.41
AUC 10 0.70 ACC 8 1.14 AAC 8 0.80 AGC 3 0.53
AUA 14 0.98 ACA 10 1.43 Lys AAA 13 1.08 Arg AGA 5 2.50
Met AUG 14 1.00 ACG 5 0.71 AAG 11 0.92 AGG 3 1.50
Val GUU 16 2.00 Ala GCU 10 1.33 Asp GAU 8 1.07 Gly GGU 21 2.00
GUC 5 0.62 GCC 9 1.20 GAC 7 0.93 GGC 3 0.29
GUA 7 0.88 GCA 9 1.20 Glu GAA 7 1.40 GGA 10 0.95
GUG 4 0.50 GCG 2 0.27 GAG 3 0.60 GGG 8 0.76
459 codons in YCG9_Probable___ (used Universal Genetic code)
input.out结果示例:
title T3s C3s A3s G3s CAI CBI Fop Nc GC3s GC L_sym L_aa Gravy Aromo
YCG9_Probable__________13 0.4337 0.2347 0.3588 0.1852 0.123 0.075 0.446 54.09 0.335 0.394 439 458 0.610699 0.122271
YCG8________573_residues_ 0.2876 0.3595 0.4222 0.1875 0.100 0.020 0.394 52.46 0.439 0.446 180 190 -0.211579 0.084211
ALPHA2________633_residue 0.3636 0.2273 0.4939 0.2177 0.109 -0.034 0.397 58.73 0.328 0.351 204 210 -0.667143 0.052381
密码子使用指数 Codon usage indices
codonw input.dat -all_indices -c_type 2 -f_type 4 -nomenu
- -c_type 2 : 选择 CAI 参考基因组, 2代表Saccharomyces cerevisiae
- -f_type 4: 选择 Fop/CBI 参考基因组,4代表Saccharomyces cerevisiae
- all_indices :所有的密码子使用指数都计算,包括:T3s, C3s, A3s, G3s, CAI, CBI, Fop, Nc, GC3s, GC, L_sym, L_aa, Gravy and Aromaticity
多条序列平均的密码子频次
codonw input.dat -nomenu -cutot input.out input.coa
结果
Phe UUU 1483 1.14 Ser UCU 1094 1.47 Tyr UAU 1000 1.12 Cys UGU 434 1.18
UUC 1117 0.86 UCC 773 1.04 UAC 789 0.88 UGC 303 0.82
Leu UUA 1349 1.55 UCA 882 1.19 TER UAA 47 1.27 TER UGA 36 0.97
UUG 1549 1.78 UCG 487 0.66 UAG 28 0.76 Trp UGG 665 1.00
CUU 698 0.80 Pro CCU 747 1.27 His CAU 677 1.15 Arg CGU 328 0.86
CUC 364 0.42 CCC 415 0.71 CAC 499 0.85 CGC 171 0.45
CUA 671 0.77 CCA 911 1.55 Gln CAA 1388 1.35 CGA 151 0.39
CUG 604 0.69 CCG 281 0.48 CAG 668 0.65 CGG 103 0.27
Ile AUU 1612 1.35 Thr ACU 1052 1.38 Asn AAU 1778 1.17 Ser AGU 717 0.97
AUC 1018 0.85 ACC 660 0.87 AAC 1262 0.83 AGC 500 0.67
AUA 943 0.79 ACA 883 1.16 Lys AAA 2118 1.13 Arg AGA 1038 2.71
Met AUG 1156 1.00 ACG 444 0.58 AAG 1645 0.87 AGG 504 1.32
Val GUU 1184 1.49 Ala GCU 1055 1.40 Asp GAU 1905 1.25 Gly GGU 1284 1.87
GUC 674 0.85 GCC 765 1.01 GAC 1145 0.75 GGC 552 0.80
GUA 622 0.78 GCA 836 1.11 Glu GAA 2371 1.41 GGA 557 0.81
GUG 690 0.87 GCG 368 0.49 GAG 995 0.59 GGG 355 0.52
53400 codons in Average of genes (used Universal Genetic code)
Correspondence Analysis (COA)
codonw input.dat -coa_cu -nomenu -silent
这会生成密码子使用情况的COA。 摘要文件为“ summary.coa”,其中包含COA生成的大多数数据。 会生成一系列的 coa文件
用上一步基因序列生成的密码子COA表
codonw input.dat -fop_file fop.coa -nomenu
codonw input.dat -cai_file cai.coa -cbi_file cbi.coa -nomenu result2.out result2.blk
cricetulus griseus 在NCBI上搜索相关的genome序列( https://www.ncbi.nlm.nih.gov/nuccore/NC_007936.1?report=fasta ),有comple,coding,gene三种。
cd /data/user/sam/project/codon_optimization/lib/codonW/genome/CHO_coding
codonw CHO_coding.fasta -coa_cu -nomenu -silent
四、报错
报错1
[sam@g02 CHO_complete]$ codonw CHO_complete.fasta -coa_cu -nomenu -silent
Welcome to CodonW for Help type h
Warning: Sequence 1 "NC_007936.1_Cricetul" does not begin with a recognised start codon
Warning: Sequence 1 "NC_007936.1_Cricetul" has 385 internal stop codon(s)
Warning: Sequence 1 "NC_007936.1_Cricetul" is not terminated by a stop codon
Number of sequences: 1
WARNING 1 sequences had internal stop codons WARNING
Generating correspondence analysis
Problems with the number genes used for fop adjusting to 1 gene
Sequence 1 "NC_007936.1_Cricetul" contains no amino acids with 2 synonymous codons
--Nc was not calculated
WARNING An attempt to calculate CAI relative adaptivnesss FAILED
no Phe amino acids found in the high bias dataset
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn