forked from soedinglab/hh-suite
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES
505 lines (355 loc) · 22.7 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
Bugs? Ideas? Need for improvement?
soeding@mpibpc.mpg.de
_______________________________________________________________________________
HH-suite 2.1.0 March 2013
* Introduced a new format for hhblits databases: the prefilter flat files
*.cs219 containing the column-state sequences for prefiltering are replaced
by a pair of ffindex files. This leads to a speed up for reading the prefilter
db by ~4 seconds, improved scaling of hhblits with the number of cores, and
lower memory requirements. For compatibility with older versions of HH-suite,
the *.cs219 files will be still be provided with new db versions.
* Improved sensitivity and alignment accuracy through introduction of a new,
discriminative method to calculate pseudocounts, described in Angermuller C,
Biegert A, and Soding J, Bioinformatics 28: 3240-3247 (2012).
* Increased speed of hhsearch about 8-fold and of HHblits about 1.5-fold by
implementing the HMM-HMM Viterbi alingment in SSE2 SIMD (single input
multiple data) instructions
* Added option -macins to hhblits, hhsearch and hhalign that controls the
costs of internal gap positions in the Maximum Accuracy Algorithm.
0: gap cost is half the mact value (default) 1: no gap cost, leads to very
gappy alignments.
* Reduced memory requirements (in Viterbi algorithm by a factor 3,
in MAC algorithm by a factor 4, in total by a factor >5), improved -maxmem
option to more accurately reflect total memory required
* Added section to hhsuite user guide on
- how to efficiently run hhblits, in particular on a cluster
- how to modify or extend existing databases
- how to visually check multiple sequence alignments
- troubleshooting
_______________________________________________________________________________
HH-suite 2.0.16 February 2013
* Added options to hhblitsdb.pl for adding (-a) and removing (-r) files
from HH-suite database. Updeated help list and did stdin and stdout fixes.
* Added options -i stdin and -o stdout to cstranslate to read from stdin and
write to stdout.
* hhblits: Fixed -i stdin, -oa3m stdout and -v 0.
* hhsearch/hhalign: Fixed small bug that removed all sequences from output
alignment when trying to use append.
* Set default number of threads to 2 for HHblits. (Before, prefiltering was
done with 2 threads, but HMM-HMM comparison with only 1.)
* Improved error messages for memory problems
* Removed bug in hhblits that would skip realignment of first three hits if
their E-value was above the one given by -e <Evalue>
* Removed bug in filtering with -neff option that led to a single sequence
if -neff target value was within 0.01 of input MSA Neff value; implemented
speed up for -neff filtering.
* Removed bug that led to occasional segementation faults with the -atab option
_______________________________________________________________________________
HH-suite 2.0.15 June 2012
* Fixed a bug in the counting of the number of sequences found.
* Removed a regression in -nodiff option: sequences in query MSA were filtered by
normal, active filters such as -diff and -id. Only db MSAs were not filtered.
* Fix hhalign regression introduced in 2.0.14.
* Removed a regression that read sequences from query HHM or <query>.a3m even
when a sequence file query.seq was given.
* Renamed option -nodiff to -all. The name -nodiff is very misleading as
the diff option is not actually switched off on the query and db alignments.
* Improved the filtering section of the help lists
* Fixed problem for small -e <E-value> thesholds by declaring par.e as double
* Refactored some of the code in hhblits.
_______________________________________________________________________________
HH-suite 2.0.14 May 2012
* All error messages are now written to stderr (not stdout) and are of the form
"Error in <program_name>: <error description>"
* Improved error handling in hhblitsdb.pl, addss.pl, multithread.pl
* Added more flexible format recognition in cstranslate binary which is called
in hhblistdb.pl
* Added -nodiff and -diff to standard help
* In hhblitsdb.pl we force the MSA files to have extension a3m now. Same for
hhm (HHsuite model) and hmm (HMMER model) files.
_______________________________________________________________________________
HH-suite 2.0.13 February 2012
* Added script splitfasta.pl to split multiple-sequence FASTA file in multiple
single-sequence FASTA files. These are needed to run hhblits in parallel
using multithread.pl.
* Implemented -maxmem option specifying maximum memory in GB. This changes the
maximum length, up to which database sequences will be realigned using the
memory-hungry maximum accuracy algorithm. This change will lead to more accurate
results for longer HMMs (>5000 match states).
* more sensible memory size defaults in init_prefilter(), fixes a segfault
* Improved error messages if databases are too big (e.g. >2GB on 32bit systems).
* Made code segfault-safe when name lengths and other strings exceed usual
sizes. Now, all strings should be cut at maximum length. E.g., name lengths
of sequences and HMMs at >=512 characters (const int NAMELEN in util.C)
* Merged some Makefile changes for Debian from Laszlo Kajan
* Corrected calculation of "Identities" for pairwise query-template alignments.
Now, gaps '-' in aligned match states of query and template master sequences
are not counted as identical residue anymore. (hhfullalignment.C, line 182)
* Added detailed description in user guide of how Identities, Similarity,
and Sum_probs are calculated
* Fixed bug in hhblitsdb.pl (printf on line 272) introduced in 2.0.12
_______________________________________________________________________________
HH-suite 2.0.12 February 2012
* Reworked SSE flags: Support compilation without SSE3 (make NO_SSE3=1), without
SSE2 (make NO_SSE2=1) with SSE4.1 (make WITH_SSE41=1) and enable all other
instructions avalable on compilation host (make WITH_NATIVE=1).
* hhblitsdb.pl: The requirement for identical file names and names of master
sequences is dropped. Violation of this requirement manifested itself as
file-not-found error during Viterbi-stage of hhblits. The code was streamlined.
* Added progress dots ... for prefiltering stage
* Cleaned up output for -v 1 and -v 0 options in hhblits and hhsearch: -v 1
should print out only warnings and errors
* Fixed segfault occuring with very long sequence names
* Minor improvements of documentation on installation
_______________________________________________________________________________
HH-suite 2.0.11 February 2012
* Corrected wrong paths for calls of ffindex binaries in hhblitsdb.pl
* Elminated rare segfault in realignment step that occurred in particular with
repeat proteins as queries
* Minor improvements of documentation on installatin, HHblits databases, FAQs
etc.
_______________________________________________________________________________
HH-suite 2.0.2 January 2012
* The iterative HMM-HMM search method HHblits has been added and the entire
package is now called HH-suite. HHblits brings the power of HMM-HMM comparison
to mainstream, general-purpose sequence searching and sequence analysis.
* The speed of HHsearch was further increased through the use of SSE3
instructions.
* A local amino acid compositional bias correction was introduced. Improvements
are slight (<=1%) on a standard SCOP single domain benchmark. However, the
improvement for more realistic sequences containing multiple domains, repeats,
and regions of strong compositional bias will is probably more pronounced.
The score shift parameter was changed from -0.01 to -0.03 bits and the mact
parameter from 0.5 to 0.35.
* The memory requirement for HMM-HMM aligment in hhsearch, hhblits, and hhalign
was reduced by a factor of ~2 by a better implementation of the Forward-
Backward algorithm
* Removed hhalign's -sto option for generating stochastic alignments. Sorry!
A better, deterministic replacement is planned for the future.
_______________________________________________________________________________
HHsearch 1.6.1 April 2011
* Increased speed by using SSE3 instructions in some functions
* Adding new option "-atab" for writing alignment as a table (with posteriors)
to file
* HHsearch is now able to read HMMER3 profiles (but should not be used due to a
loss of sensitivity)
* Removed a bug related with "neutralize HIS-tag" option
* Removed a bug related with empty alignments (no match states)
_______________________________________________________________________________
HHsearch 1.6.0 November 2010
* A new procedure for estimation of P- and E-values has been implemented that
circumvents the need to calibrate HMMs. Calibration can still be done if
desired. By default, however, HHsearch now estimates the lamda and mu
parameters of the extreme value distribution (EVD) for each pair of query
and database HMMs from the lengths of both HMMs and the diversities of their
underlying alignments. Apart from saving the time for calibration, this
procedure is more reliable and noise-resistant. This change only applies to
the default local search mode. For global searches, nothing has changed. Note
that E-values in global search mode are unreliable and that sensitivity is
reduced. Old calibrations can still be used:
-calm 0 : use empirical query HMM calibration (old default)
-calm 1 : use empirical db HMM calibration
-calm 2 : use both query and db HMM calibration
-calm 3 : use neural network calibration (new default)
* Previous versions of HHsearch sometimes showed non-homologous hits with high
probabilities by matching long stretches of secondary structure states,
in particular long helices, in the absence of any similarity in the amino
acid profiles. Capping the SS score by a linear function of the profile score
now effectively suppresses these spurious high-scoring false positives.
* The output format for the query-template alignments has slightly changed.
A 'Confidence' line at the bottom of each alignment block now reports the
posterior probabilities for each alignment column when the -realign option
is active (which it is by default). These probabilities are calculated in the
Forward-Backward algorithm that is needed as input for the Maximium ACcuracy
alignment algorithm. Also, the lines 'ss_conf' with the confidence values
for the secondary structure prediction are omitted by default. (They can
be displayed with option '-showssconf'). To compensate, secondary structure
predictions with confidence values between 7 and 9 are given in capital
letters, while for the predictions with values between 0 and 6 lower-case
letters are used.
* In the hhsearch output file in the header lines before each query-database
alignment, the substitution matrix score (without gap penalties) of the query
with the database sequence is now reported in bits per column. Also, the sum
of probabilities for each pair of aligned residues from the MAC algorithm
is reported here (0 if no MAC alignment is performed).
* The buildali.pl script now uses context-specific iterative BLAST (CSI-BLAST)
instead of PSI-BLAST. This considerably increases the sensitivity of
buildali.pl/HHsearch.
* Removed a bug which produced a segfault for input alignments with more than
15000 match columns. Now, the HHsearch binaries will issue a warning and
will transform only the first 15000 match columns into an HMM.
* Removed a bug in the multi-threading code that could lead to occasional
hang-ups (race condition) in situations where slow file access was impeding
program execution and inter-thread signaling was unreliable.
* Removed a memory leak and optimized memory management.
* Removed a bug in hhalign that could lead to unreasonably significant E-values
and probabilities due to calibration problems.
* HHsearch now performs realign with MAC-alignment only around Viterbi-hit
_______________________________________________________________________________
HHsearch 1.5.0 August 2007
* By default, HHsearch realigns all displayed alignments in a second stage
using the more accurate Maximum Accuracy (MAC) alignment algorithm
(Durbin, Eddy, Krough, Mitchison: Biological sequence analysis, page 95;
HMM-HMM version: J. S\"oding, unpublished). As before, the Viterbi algorithm
is employed for searching and ranking the matches. The realignment step is
parallelized (`-cpu <int>`) and typically takes a few seconds only. You can
switch off the MAC realignment with the -norealign option.
The posterior probability threshold is controlled with the -mact [0,1[
option. This parameter controls the alignment algorithm's greediness. More
precisely, the MAC algorithm finds the alignment that maximizes the sum of
posterior probabilities minus mact for each aligned pair. Global alignments
are generated with -mact 0, whereas -mact 0.5 will produce quite conservative
local alignments. Default value is -mact 0.35, which produces alignments of
roughly the same length as the Viterbi algorithm.
The -global and -local (default) option now refer to both the Viterbi
search stage as well as the MAC realignment stage. With -global (-local), the
posterior probability matrix will be calculated for global (local)
alignment. Note that '-local -mact 0' will produce global alignments from
a local posterior probability matrix (which is not at all unreasonable).
* An amino acid compositional bias correction is now performed by default.
This increases the sensitivity by 25% at 0.01 errors per query and by 5% at
0.1 errors per query. By recalibrating the Probabilities, the increased
selectivity of this new version allows to give higher probabilities for the
same P-values. Also, the score offset could be increased from -0.1 bits to 0
as a consequence.
* The algorithm that filters the set of the most diverse sequences (option
-diff) has been improved. Before, it determined the set of the N most
diverse sequences. In the case of multi-domain alignments, this could lead
to severely underrepresented regions. E.g. when the first domain is only
covered by a few fairly similar sequences and the second by hundreds of very
diverse ones, most or all of the similar ones were removed. The '-diff N'
option now filters the most diverse set of sequences, keeping at least N
sequences in each block of >50 columns. This generally leads to a total
number of sequences that is larger than N. Speed is similar. The default
is '-diff 100' for hhmake and hhsearch. Speed is similar. Use -diff 0 to
switch this filter off.
* The sensitivity for the -global alignment option has been significantly
increased by a more robust statistical treatment. The sensitivity in -global
mode is now only 0-10% lower than for the default -local option on a SCOP
benchmark, i.e. when the query or the templates represent single structural
domains. The E-values are now more realistic, although still not as
reliable as for -local. The Probabilities were recalibrated.
* A new binary HHalign has been added. It is similar to hhsearch, but performs
only pairwise comparisons. It can produce dot plots, tables of aligned
residues, and it can sample alternative alignments stochastically. It uses
the MAC algorithm by default.
* HHsearch and hhalign can generate query-template multiple alignments in
FASTA, A2M, or A3M format with the -ofas, -oa2m, -oa3m options
* Return values were changed to comply with convention that 0 means no errors:
0: Finished successfully
1: Format error in input files
2: File access error
3: Out of memory
4: Syntax error on command line
6: Internal logic error (please report)
7: Internal numeric error (please report)
8: Other
* Added script buildali.pl <file> to automatically build PSI-BLAST multiple
sequence alignments, including predicted and DSSP secondary structure.
Buildali.pl prunes sequences individually from both ends to reduce corruption
by non-homologous fragments (J. Soeding, unpublished).
* Added script hhmakemodel.pl <file.hhr> that parses hhsearch results files
and can generate FASTA or PIR multiple alignments or build rough 3D models.
* Removed a bug due to which pseudocounts where added to HMMer HMMs (which
already have their own pseudocounts added). This bug severely reduced
sensitivity for HMMs read in HMMer format.
* Moved a lot of memory allocation from stack to heap to avoid segmentation
faults under some Windows systems.
* Removed a bug due to which the query-template alignments where not displayed
on some platforms when output was directed to stdout
* Removed a bug that caused occasional segfaults under SunOS when reading HMMer
files
* Added multi-threading (-cpu <int>) for Windows x86 platform
* Cleaned up output formatting of summary list for Windows x86
* Stopped support for the Alpha/DEC platform
* Anyone still interested in Mac OSX/PPC or SunOS support?
_______________________________________________________________________________
HHsearch 1.2.0 January 2006
* Parallelized hhsearch for SMPs: using option -cpu <int>, hhsearch can be run
in parallel on several CPUs of a symmetric multiprocessor (SMP) machine. Speed
with two CPUs is ~1.9x, with two Dual Core AMDs ~3.4x.
Memory requirements for dynamic programming matrix is now
~100kb*(Query_length+2) * number of queue bins.
Number of queue bins = floor(1.2*CPUs+1) (for cpu>1), 1 otherwise
* HHsearch can now automatically detect and read HMMER format (ASCII).
Predicted secondary structure (or consensus secondary structure) can be
used in HMMER format in the same way as in HHsearch format. The script
addpsipred.pl is able to add the information for predicted secondary structure
to a HMMER file (as well as to a FASTA alignment). However, representative
sequences for query-template alignments are not supported by HMMER's format.
Performance is not as good HMMer-format as for hhm format, so use hhsearch's
hhm format as far as possible.
* Multiple databases can now be searched with -d 'db1.hhm db2.hhm ...'
* Added option -alt <int> for displaying up to <int> alternative alignments
that have no residue pairs in common (e.g. when aligning repeat proteins)
* Added percent sequence identity to alignment information
* Increased speed of hhsearch database searching by ~15%
* Added possibility to read from standard input and write to standard output
* Removed the limit 30 000 on maximum number of HMMs in database.
* Increased maximum number of match states in HMM from 6000 to 15000.
* Checked code sanity with Valgrind (a great tool! http://valgring.org)
and removed a bug that was responsible for rare segmentation faults.
* Removed a bug in the maximum sequence identity filtering routine. Before,
when two sequences had no overlap, the shorter of the two was filtered out.
This could have affected HMMs for multidomain proteins quite badly.
* Changed options for minimum and maximum numbers of sequences in output file
(options -p, -E, -B, -b, -Z, -z)
* Impoved sensitivity somewhat (~10%) by using option '-diff 100' as default
for hhmake and hhsearch (use '-diff 0' to turn this option off). Optimized
internal pseudocount admixture parameter with -diff 100 setting. Results
between HHsearch 1.2 and earlier versions may therefore differ slightly.
* Perl scripts: Added -Q option to alignblast.pl that allows to include query
sequence as first sequence of extracted alignment. Script addpsipred.pl can
now add PSIPRED-predicted secondary structure also to HMMER model files.
* Included binaries for Windows32 (a bit slow, no threads supported)
_______________________________________________________________________________
HHsearch 1.1.4 March 2005
* Removed small memory allocation bug in function that reads .hhdefaults file.
* Reserve dynamic programming memory dynamically. Memory requirement is now
6*MAXRES*(Query_length+2) bytes (+ space for ALL sequences in SEQ records of
HMM database).
* Changed maximum number of sequences in input alignments from 20 000 to 30 000
* Changed maximum length of sequences in input alignments from 20 000 to 30 000
* Changed maximum number of match states MAXRES in input alignments from 5000
to 6000
* In output alignments, changed the name of the consensus sequences from
'Cons-<id>' to 'Consensus'.
* Added secondary structure-dependent gap penalties ('-ssgap' option).
Gap opening within SS elements costs X bits after 1st, 2X bits after 2nd and
3X bits after third SS residue, where X is 1.0 bit by default and can be
changed with the '-ssgapd X' option.
=> A small benchmark on the CAFASP4 targets showed NO improvement!
* Included binaries for MacOSX (Darwin) and OSF1/Tru64_UNIX (thanks to Paul
Sarokas, GSK)
_______________________________________________________________________________
HHsearch 1.1.3 November 2004
* Allowed up to 10000 sequences to be displayed in output alignments
* Added warning message when not enough memory available
* Corrected warning message when not enough different superfamilies are
contained in the searched database to fit the score distribution.
* Removed a bug in the part that builds the output alignments that could cause
a segmentation fault on rare occasions.
* Remove carriage return (CR) symbols in names of input sequences that
could mess up output alignments when sequences were read in via a web form.
* Corrected a bug in the calculation of transition probabilities for insert
states. (Improvement in sensitivity by ~2%.) Need to recalculate old HMMs.
* Changed format of hitlist in output. Now, the first 30 characters of the
sequence name and description are shown, not only the ID and the family code:
(No Hit Prob E-value...). This is more suitable for databases other than SCOP
* Removed limit of 200 characters for sequence descriptions to be able to
display the extensive annotation for Pfam and SMART alignments, for instance.
* In the output alignment the numbering of the consensus sequence did not
correspond to the match columns anymore. This has been corrected.
_______________________________________________________________________________
HHsearch 1.1 August 2004
* Format of *.hhm files was changed. Transition pseudocounts are now
calculated on the basis of the effective number of sequences that are in
M states, I states and M states, respectively. These numbers are listed
as Neff, Neff_I, Neff_D in seperate columns of the hhm files. (Improvement
in sensitivity by a few percent.)
* Reduced memory requirements by a factor of four. Memory requirement now is
approx. 6*MAXRES^2 bytes ( + space for ALL sequences in SEQ records of HMM
database). MAXRES is maximum number of query residues, 5000 at the moment
_______________________________________________________________________________
HHsearch 1.0 May 2004
* Version used for benchmark published in Soding J, Bioinformatics (2005).