Hennig86 Simplified

Contact Arnold Kluge at the University of Michigan or Diana Lipscomb at George Washington University to purchase ($50) this program.


Contents

A note on numbering:
Perhaps one of the most frustrating things for first-time users of Hennig86 is the way it treats anything numbered. Everything begins with zero (0) not one (1)! Thus your taxa in your matrix, for the purposes of setting outgroups, inputting tree topolog ies, etc. are considered by Hennig86 to occur in the order 0, 1, 2, 3... and your characters, for the purposes of ordering/unordering, are in the order 0, 1, 2, 3... . Similarly, internal tree files are numbered 0 through nine. Trees output from an analysis are numbered from 0, so if a bunch of them scroll past your eyes and the last one says "tree 5" it is the sixth tree.


Data Matrix

Create the data matrix with a text editor (e.g., EDIT, Norton Editor, or WEDIT) or with a word-processor (e.g., Word or WordPerfect, making sure to save as "text-only", "unformated", or "DOS text" as the case may be).

The DOS file suffix can be anything you choose, or absent altogether.

Hennig86 recognizes the command "xread" to read in a data matrix. Immediately following "xread" can be a title enclosed in single quotes and then the number of characters followed by the number of taxa, or the title can be omitted.

The following matrices (filename = test.hen) are identical:

xread
'test file' do not put apostraphes or semicolons in your 
7 6 title (e.g., 'Smith's data; reanalysis')
zero 0000000
one 0001110
two 0111211
three 0111111
four 1012111
five 1012110
;
xread
'test file'
7 6
zero
0 0 0 0 0 0 0
one
0 0 0 1 1 1 0
two
0 1 1 1 2 1 1
three
0 1 1 1 1 1 1
four
1 0 1 2 1 1 1
five
1 0 1 2 1 1 0
;
xread 'test file' 7 6 zero 0000000 one 0001110 two 0111211 three 0111111
four 1012111 five 1012110;

Thus, taxon names can be of any length, any number of spaces or carraige returns can be inserted between taxon names and character states.

To designate unknown,or inapplicable states use either "-" or "?". by convention, "-" denotes inapplicable and "?" denotes unknown, though they are treated exactly the same by the algorithm.

The semicolon at the end is optional for Hennig86 but is absolutely required for CLADOS, and for certain algorithms in Random Cladistics.

To read in the matrix, type "proc test.hen;" at the *> prompt. If you receive a "proc>" prompt it means you neglected to include the semicolon after "proc test.hen"; simply type the semicolon and hit to continue.

Optionally, you may place the command "proc /;" at the end of the data matrix. This will allow all successive commands to entered all on a single line from DOS (e.g.,C:\ ss proc test.hen; mh; bb; tp;) instead of having to do it interactively.

I recommend operating Hennig86 interactively until you know what you're doing.

You may also enter a data matrix while you are in Hennig86. However, you can't save it so why bother. To do this, type the following:

by leaving off the semicolon, Hennig86 stays in xread mode giving you the xread> prompt.

*> xread 'test file'
xread> 7 6
xread> zero 0000000
xread> one 0001110
xread> two 0111211
xread> three 0111111
xread> four 1012111
xread> five 1012110; the semicolon tells Hennig86 the end of input
*>


Running Hennig86

To start up Hennig86 just type "ss" at the DOS prompt. You will be met with the generic Hennig86 prompt *>.

To read in your matrix type "proc" and your filename followed by a semicolon (e.g., *> proc test.hen; ). If you receive a procedure> prompt it means you forgot to put the semicolon in... just enter it at this prompt and all will be fine.

HOWEVER... Usually you will likely want to both see what you are doing and have the results saved to a logfile that can be examined later. The default options of Hennig86 will only output to the screen, not a file. However, if you define a logfile, output will only go to the logfile and not to the screen! To get both, begin by declaring a logfile of any name with a "log ;" command, and the display-on command "display*;".

*> log test.out;opens a file for output
*> display*;allows screen dsplay as well
*> proc test.hen;reads in contents of test.hen

and continue. This will create a logfile called "test.out" and will turn screen display on.

At anytime you can suppress output to the logfile by typing "log-;" and turn it back on by typing "log*;". Similarly you can turn off the output to the screen by typing "display-;". You might want to do this as you fiddle around and figure out what's happening to your data as you analyse it while not dumping extraneous stuff to your logfile.


Setting Outgroups

Hennig86 will automatically choose the first taxon in the matrix as the outgroup if none is specified.

To specify outgroups type "outgroup = " followed by their numbers. TAXA ARE NUMBERED FROM ZERO! NOT FROM ONE!

For example:

*> outgroup = 0 1 2;

To view which taxa have been set as outgroups type:

*> outgroup;

Okay this next bit is important so pay attention... Hennig86 will not constrain all designated outgroup taxa to sit outside of the ingroup if it is not globally parsimonious to do so (usually it is though if you have chosen ingroup and outgroup taxa with care). If more than one taxon is designated as an outgroup, one of them is always considered to be the "prime" outgroup taxon. That is, the ultimate root for the tree is determined by a single prime outgroup taxon. BY DEFAULT, THIS IS THE NUMERICALLY-FIRST TAXON DESIGNATED. If you wish some other outgroup taxon to be designated as prime (for example taxon 1 instead of taxon 0 in the example above), type the following:

*> outgroup = 0 1 2 /1;

The front-slash designates the prime taxon.


Character Order/Weighting/Activation

Note, it is impossible to interactively inactivate taxa with Hennig86. This can only be accomplished by editing the input file, and changing the specified number of taxa following the xread statement.

To view the current status of character codes type "ccode;" alone. Something like the following will be shown:

0 1 2 3 4 5 6
1+[ 1+[ 1+[ 1+[ 1+[ 1+[ 1+[

The top line is the character number (FIRST = ZERO). The second line has three positions for each character. The first position is the weight (all weighted 1 above). The second position indicates whether the character is "ordered" (+) or "unordered" (-). The third position indicates whether the character is active ([) or inactive (]).

NOTE: the default options in Hennig86 are as above, unitary weight, ordered and active. If this is not your preference you must change it on your own. For example, if you are running dna sequence characters, you must unorder all multistate characters.

To change weights type "ccode" followed by a slash and the weight followed by the character(s) to be weighted. For example:

*> ccode /2 3 5; weights the fourth and sixth by 2

To change "ordering" is similar:

*> ccode - 3 5; unorders the fourth and sixth

To change whether or not the character is active:

*> ccode ] 3 5; inactivates the fourth and sixth

Ranges of characters can be identified by placing a period between them. For example:

*> ccode - 3.5; unorders 3 through 5
*> ccode - .; unorders all characters
*> ccode /+[ .; returns everything to default of 1+[

more than one coding change can be performed in a single command:

*> ccode - 10.23,5 /2 8,17,24 /3 6,10 ] 16;

The above will unorder the eleventh through the 24th, as well as unorder the 6th, will weight the ninth, eighteenth and 25th by 2, will weight the seventh and eleventh by 3 and will inactivate the seventeenth.


Finding the most parsimonious trees

Hennig86 has many ways to do this depending on how exhaustive you wish to be.

Heuristic

Exhaustive

ie-
ie
ie*

For fewer than 20 taxa the exhaustive procedures can be used efficiently.

*> ie-;

finds only one of possibly many equally parsimonious trees

*> ie;

finds all equally parsimonious trees provided that there are fewer than 100 of them. At 100, OVERFLOW is indicated.

*> ie*;

uses all available memory to save equally parsimonious trees regardless of number.

*> hennig;

pretty much useless on its own. Calculates one tree by a single pass through the data. Not likely to be of shortest length. As a start-up for branch swapping mhennig is better anyway.

*> mhennig;

calculates multiple trees by multiple passes through the data. The trees are not likely to be most parsimonious, if the data are complicated. Used as starting point for branch swapping.

*> bb;

does nothing if hennig or mhennig has not been done first.

*>mhennig; bb;

generates multiple trees and then applies branch swapping to them to find multiple equally parsimonious trees (up to 100).

*> mhennig; bb-;

as above but holds onto only one tree in the end.

*> mhennig; bb*;

as above but holds onto more than 100 if there are that many.

In practise you will find that "mhennig;bb;" is a decent place to start looking at your results,. However, I would not publish until you have at least attempted "ie;"!

NOTE THAT PERFORMING THE ABOVE ONLY FINDS THE TREES, IT DOES NOT OUTPUT THOSE TREES ANYWHERE, NOR DISPLAY THEM.

To view/output all resulting trees currently in memory:

*> tplot;

You might want to take a quick look at the number found before actually doing this!

NOTE: if you want to look at them before sending them to a file, don't forget to type "log-;" first to turn off your logfile.

If there are multiple trees the screen will keep scrolling, use [CTRL]-S to delay the scroll.

NOTE: trees are numbered from 0 (like everything else in Hennig86).

CAUTION: often, the output to a logfile from a tplot are unreadable later; if so, go back, re-do the analysis and type "txascii-;" first. Trees will output to the logfile in angular form.

To view/output all resulting trees in parenthetical notation:

*> tlist;
see later for interpreting these


Consensus Trees

Hennig86 will only construct strict consensus. The command is "nelsen;" (with an e). It does not create a Nelson (with an o) Consensus (i.e., combinable components), nor majority rule, nor Adams consensus.

Doing so will wipe out any memory of the original trees unless you saved them as an internal treefile.

To see the consensus you must follow with a "tplot;" command of course.


Internal Treefiles

Hennig86 allows you to hold on to some of your steps as you go along, using the keep and get commands.

Let's say you want to: 1) calculate a set of MEPT's by a heuristic search command, 2) view all of the trees, and then 3) you want to construct a consensus tree without erasing the original trees and then 4) view the consensus;

then you decide that you are satified with the results and wish to 5) output the original trees as well as 6) output the consensus.

If you entered the commands in the order: mhennig; then bb; then tplot; to see the trees then nelsen; this last command will create the consensus but will also wipe out all memory of the previous trees on which it was based!!!!

The following will circumvent this:

*> proc test.hen; reads in data

*> outgroup = 0 1; sets outgroups

*> mhennig; calculates trees

*> bb; applies branch breaking

*> tplot; outputs trees to screen

*> keep 1; puts trees found by bb into internal treefile #1.

*> nelsen; constructs a consensus

*> tplot; views the consensus

*> keep 2; puts the consensus in internal treefile #2

*> log test.out; opens a logfile for output

*> display*; allows screen display as well

*> get 1; retrieves trees in treefile #1;

*> tplot; outputs trees to screen and test.out

*> get 2; retrieves consensus tree in treefile #2;

*> tplot; outputs consensus tree to screen and to test.out

If you study the above you will realize the logic of the following rules:

Internal treefiles are numbered 0 through 9.

Treefile #0 is always current.

Any action will wipe out the contents of #0 replacing it with the results of that action. These actions include calculating a consensus, choosing some of the MEPT's with a tchoose command (see Choosing Trees below), calculating new trees after weighting etc., retrieving a stored treefile, or importing a saved external treefile (see External Treefiles below).

"keep n;" puts tree(s) in #0 into internal treefile #n and leaves them in #0. If there was already something in #n it will be wiped out.

"get n;" retreives tree(s) from #n and puts them into #0, and leaves them in #n. Whatever was in #0 is wiped out.

"erase n;" deletes the contents of #n.


Abbreviating/Combining/Repeating commands on a single line.

Now that you know something about how Hennig86 works, you should be aware that multiple commands can be put on a single line to executed as a batch. If you do so, you will not be able to stop it until it's finished.

Also most Hennig86 commands have short forms that save on typing time.

Thus the above series of commands detailed in the internal treefile section can be done as follows:

*> proc test.hen;

*> outgroup = 0 1; mhennig; bb; tplot; keep 1;

*> nelsen; tplot; keep 2;

*> log test.out; display*; get 1; tplot; get 2; tplot;

Or as follows:

*> p test.hen;

*> o = 0 1; m; bb; tp; k 1; n; tp; k 2; l test.out; d*; g 1; tp; g 2; tp;

In general it takes some time to figure this out. Other less severe abbreviations are allowed (e.g., mhennig = mh = m and outgroup = out = o).

Commands previously typed and executed can be repeated or repeated and edited by typing F3.

A more complete listing of cammands and abbreviations can be found on the last pages of this guide.


Choosing Trees

Occassionally, you may like one or more of the MEPT's over some of the others, though I can't imagine why. Alternatively, because the tplot command dumps out all trees on the screen at once you may want to look at them one at a time.

To do this you need to invoke the tchoose command (or just tc). But be aware that like taxa and characters, trees are numbered from zero, not from one.

In practise you'd be wise to save all trees in an internal treefile first so you can get them back easily.

For example:

*> mh;bb*; calculates MEPTs allowing >100

*> k 1; stores them in internal #1

*> tc 0; keeps only the first (0) in resident (#0)

note that the 0 after tc refers to the first of the
multiple trees, NOT treefile #0.

*> tp; looks at it

*> g 1; retrieves all trees

*> tc 1; tp; keeps only the second and looks at it

*> g 1; tc 2; tp; you get the idea


Number of Steps on Trees and Character Optimization

The xsteps (or xs) is very versatile depending on what options you use.

*> xs;(same as xs l )

outputs the following for 5 equally parsimonious trees in the default (#0) internal treefile:

0 1 2 3 4
12 12 12 12 12

which is telling you that all 5 trees have 12 steps given the data at hand.

Let's say this was based on ordered multistate data and you want to see what effect unordering has with respect to the number of implied steps on these 5 trees (as opposed to recalculating trees from the revised codes). The following would accomplish this:

*> cc -.; unorders all characters

*> xs; optimizes all characters on the 5 pre-existing trees

If the results are as follows:

0 1 2 3 4
10 11 10 11 11

you'd know that, though all trees are equivalent with respect to the ordered data, they are not all equivalent with respect to unordered data.

For every character on every tree, "xsteps c;" will output the number of steps, the character CI and the character RI under the character's number. For example:

0 1 2 3 4 5 6 7
1 2 1 1 3 2 4 1
100 50 100 100 66 50 25 100
100 0 100 100 100 75 50 100

If there are multiple trees in resident memory, this will be done for each tree and will be followed by the best fit for each character among all trees and the worst fit of each character among all trees. Typing "xsteps m;" will do the latter (best and worst) alone.

"xsteps h;" is a little more complicated and has to do with character state changes and optimizations. In practise you'll find CLADOS more helpful. What it does is, for each tree in resident memory (i.e., internal treefile #0), and for each character, is pump out the state (or possible states) implied for each internal node according to how they are numbered on the results of a tplot. This has very limited utility as the number of an ancestral node will vary from one equally parsimonious tree to the next such that they are not directly comparable across MEPT's.


tsave" and is followed by a filename of your choice.

So, for example, given two data sets in files test.one and test.two. WITH THE CAVEAT THAT BOTH DATA SETS HAVE THE SAME NUMBER OF TAXA ENTERED IN THE SAME ORDER IN THE MATRIX.

Consider the following:

*> proc test.one;

*> mh;bb;

*> tsave one.tre; saves trees to a file

*> proc test.two; imports new data set wiping out old

*> mh; bb;

*> tsave two.tre; saves trees to file

*> proc test.one; retrieves first data set

*> proc two.tre; retrieves results of second

*> xs; forces first data on second results

*> proc test.two; retrieves second data set

*> proc one.tre; retrieves results of first

*> xs; forces second data on first results

If you look at the saved file (e.g., view one.tre) you'll notice it's in parenthetical notation with the taxon numbers representing the taxa which is why if you do this for different data sets everything must be in the same order.

To create a topology of your own you need to know how to interpret trees in parenthetical notation.

is the same as

((0 1) (2 (3 4)))

where a pair of matched parentheses delimits a monophyletic group

In situations of unresolved taxa:

they are represented as follows:

((0 1) 2 (3 4))

I reccomend that to get used to this, follow a tplot with a tlist and compare the two.

In any case, you may input a tree topology of your own making into the resident treefile (i.e., #0) wiping out any tree(s) that are there currently as follows:

*> tread ((0 1) (2 (3 4)));

which is the same as using the taxon names:

*> tread ((one two) (three (four five)));

For safety, you should match parentheses. For speed, you don't have to:

*> tread one two) (three (four five;

is the same as above. Consult the Hennig86 manual for more advanced notations.

Another convenient way of building trees is as follows:

*> tread (0 4) (leave off semicolon so you can keep on tread-ing)
tread> (0 1) replace 0 above with 1 and 0 as sister taxa
tread> (4 2) replace 4 above with 2 and 4 as sister taxa
tread> (4 3); replace 4 above with 3 and 4 as sister taxa, semicolon=end

which builds exactly the same tree as above. Note that the pre-existing taxon must appear first in the replacement pair.


Leaving Hennig86

*> yama

or just

*> y


TROUBLESHOOTING

WHEN I TYPE A COMMAND I GET GET THE COMMAND NAME FOLLOWED BY A QUESTION MARK.

The command does not exist.

WHEN I TYPE A COMMAND, I GET THE COMMAND NAME FOLLOWED BY A ">".

You forgot the terminal semicolon. Enter it now.

WHEN I TRY TO READ IN MY DATA FILE I GET CHARACTERS AND A ? RETURNED AND I CAN'T RUN MY DATA SET.

Your input file is not formatted correctly. Likely, the number of characters and number of taxa were entered in reverse order (common among PAUP users), or the numbers do not match the matrix size.

WHEN I TRY TO READ IN MY DATA FILE I GET "open " AND THEN WHEN I TRY TO RUN THE DATA IT TELLS ME "no data".

The name you used is wrong, or not in the directory.

WHEN I TRY TO SET OUTGROUPS, HENNIG86 DOES NOT SET THEM TO THE NUMBERS I HAVE HAVE REQUESTED.

You typed "outgroup # # #;" and forgot the "=" sign. Or you're forgetting that the first taxon in your matrix is #0 not #1.

WHEN I RUN MY DATA SET I GET AN UNREALISTIC TREE WITH HUGE NUMBERS OF STEPS.

You've made the mistake of using 9's for missing data (an old paup-ism) instead of "?" or "-".

WHEN I TYPE "bb" NOTHING HAPPENS.

bb must be preceded by h or mh

WHEN I RUN MY DATA SET, NOT ALL OF MY OUTGROUPS FALL OUTSIDE OF THE INGROUP.

That's life. You've violated the primary assumption of monophyly of the ingroup and any results are suspect. Welcome to global parsimony.

WHEN I EXAMINE MY LOGFILE WITH A WORD PROCESSOR, THE TREES ARE FULL OF WEIRD CHARACTERS INSTEAD OF LINES.

Try a different font. Or go back into Hennig86 and type "txascii-;" first.


List of commands, their abbreviations and switches:

note, the portion in bold-roman should be typed, any portion of the rest of the command is optional, for example, mhennig, m must be typed, but mh, mhe, mhen, mhenn, mhenni, or mhennig will all have the desired effect. Where an ambiguou s abbreviation is given (e.g., "t"), Hennig86 will choose for you!

switches are options that may follow the command depending on what you want to do, below they are enclosed in curly braces which are not part of the command.

assist
brings up list of commands, the assistance is not breathtaking

batch{-}
(nil) cause exit to DOS on any error
- remain in Hennig86 on any error [default]

bb { -, * }
branch breaking on pre-existing trees
(nil) find many up to 100
- find one
* find all up to maximum available mmeory

bytes
indicates how much memory is taken up by Hennig86

ccode{ -, +, /, ], [ }
character coding display and change

(nil) list current codings
- unorder
+ order
/ weight
] deactivate
[ activate

ckeep
requires a numerical argument (0 through 9) indicating an internal character-
coding file in which to save the codes as they are currently set and from
which they can be retrieved later, see cget

cget
retquires numerical argument (0 through 9) indicating the internal character-
coding file (previously saved by a ckeep) from which codes are to be re-set.

display{ -, * }
monitor display toggle
- turn off [default after a log command]
* turn on [default before a log command, or after logfile is closed]

erase
requires a numerical argument indicating the internal treefile to erase

files
list internal treefiles by number of each, number of trees in each, and the
command that created each.

get
requires a numerical argument specifying the treefile to retrieve.

hennig
calculate one tree by one pass through the data

mhennig
calculate multiple tree by multiple passes through the data

ie{ -, * }
implicit enumeration of all trees to find shortest tree(s)

(nil) find many up to 100
- find one
* find all

keep
requires a numerical argument specifying the internal treefile in which to save
the trees currently in resident memory

log{ -, *, / }
logfile opening and output control

(filename) open filename
- stop outputting
* resume outputting
/ close

nelsen
construct a strict consensus of current trees

outgroup{ =, / }

(nil) show status of outgroup
= set outgroup
/ set prime taxon

procedure
a fairly complicated way of looking at what you're doing. If you specify a filename Hennig86 looks for the file in the current directory, if the file is not there, it will open one with that filename and expect you to start inputting data at the procedur e> prompt. Once you specify proc/; it closes the procedure file. In so far as prepared files go, the file may contain the data matrix alone, (i.e., an xread command) and you will then be able to work with Hennig86 interactively. or it may contain any s eries of commands following an xread command to cause Hennig86 to go ahead and do everything without you having to type anything. Usually you'll just want to put a data matric in the file so you can run Hennig86 interactively. More than one proc stateme nt may be involved in your running of Hennig86, for example, one for reading the data matrix and one later for reading in trees saved by a tsave command.

quote
allows comment lines to be sent to your logfile
e.g., quote the next trees are with states unordered;

reroot
used when trees have been calculated (at the expense of your time) and a decision to change the outgroup composition has been made afterwards (i.e., use reroot immediately following a out= command). The tree(s) are re-rooted without having to recalculate them.

steps
longer output than xsteps

tchoose
requires numerical argument(s) specifying which tree(s) of those currently in
in resident memory are to be kept, the rest are discarded.

tlist
like tplot but in parenthetical notation

tplot
show me all trees in resident memory

tread
read in trees in parenthetical notation

tsave
output a Hennig86 procedure file to disk which has a tread and the current
trees in parenthetical notation.

txascii{ -, * }

- extended characters off
* extended characters on

view{;}
file viewer.

(filename) view filename
/ pg down
\ pg up
. line down
- line up
* end of file
; or , close
(do not use the semicolon on the command line, as it will automatically close the file
before you get a chance to see the whole thing)

watch
programer's vanity

xread
must be followed by properly formatted data input

xsteps{l, c, m, h, w, u}
tree diagnostics

l global length, ci, ri of each tree in current memory
c each character length, ci, ri on each tree
m best and worst fits of each character from among the trees
h optimal state of each character at each internal node of each tree
w assign weights to all characters based on their rescaled consistency indeces
u tree compression

xx
Dos Equis, character diagnostics, pay attention now...

(nil) enter into character optimization viewer (do not use a semicolon! typing xx; will enter then exit Dos Equis)
(number) switch to character
/ view next page of tree
\ view previous page of tree
. next line
- previous line
\\(number) delete branch number (number)
\(number1)(number2) rotate branch (number1) to branch (number2)
/2 reweight the current character by 2 (works for any weight of course)
+ make current character ordered
- make current character unordered
] make current character inactive
[ make current character active
= save the revised tree and the revised character codes and exit
; exit without saving anything

yama
accepts no argument, exit from Hennig86 unceremoniously and
without prompting you if you want to save anything