A short introduction to the program

This file will help you to make your first steps with the program by giving you some examples which should (as I hope) show you its most important features. So just follow the steps below to discover it.

1) Go to the path that contains the file EEDIT.EXE and start the program by typing EEDIT <RETURN>

You are now in the editor. Go to the menu "File" and choose "New" to make a new file. (There are several ways to do that: either you type "ALT-F" (press ALT and F at the same time - whenever a character in the menu is red, it means that you can access it by pressing simultaneously ALT+character) or you type F10 and use the arrow keys or you use the mouse. Once you've choosen NEW, a empty window appears on the screen.

In this window, enter the following definitions lines:

   1 : type := { voy, cons }
   2 : art := { bil, alv }
   2 : type := { ocl, fric, nas }
   2 : art := { ant, cent, post }
   2 : art := { haut, moy, bas }
   2 : apert := { ouv, ferm }
   2 : accent := { ton }
   m := [ cons, bil, nas ]
   n := [ cons, alv, nas ]
   s := [ cons, alv, fric ]
   e := [ voy, ant, moy ]
   a := [ voy, cent, bas ]
   o := [ voy, post, moy ]
   V := [ +voy ]
   C := [ +cons ]
   O := [ +cons, -nas ]
   e > a
Now, what does this all mean? As you can see, there are four different parts which are marked by @defsomething and @endsomething. Those for parts are 1) the feature definitions, 2) the litteral (= phonems) definitions, 3) the symbol definitions and 4) the list of rules. To know more about these parts, refer to the page "Notation of linguistical rules".

Save the file as "test.txt" (go to the menu "File", choose "Save as.." and enter "test.txt"). Now start the calculation (press CTRL-F9 or choose "Start calculation..." from the menu "Run")

[Note 1: if the program doesn't do anything, when you choose "Start calculation..." it might be that the editor (EEDIT) cannot call the main program (ETYMO) in your environment. In this case leave the editer (Menu "File" -> "Exit" or ALT-X) and type "ETYMO test.txt" <RETURN> at the DOS prompt. Once you have finished the calculation, you must restart the editor by typing "EEDIT" <RETURN> at the DOS prompt]

[Note 2: if there are some errors (in this case, the program prints an error message on the screen) go back to the editor an open the file ERROR.LOG (this file is automatically generated by the program and contains all the errors and warnings the program found in your rule catalogue). The program indicates the line number an the type of error. So, once you've opened ERROR.LOG, go back to the rules file (by pressing F6), go to the corresponding error line (it is indicated at the bottom of the window) and make the necessary changes. Then, restart the program. Note that some errors may be "consecutive errors": for example, if you make a mistake in a feature definition so that a feature group cannot be read by the computer you might get the error "feature not defined" in a litteral definition or in a rule (although the rule and the definition by itself are correct). Therefore always start correcting with the first error.]

Enter the following word.

#mensem# <RETURN>
[Go here to get a detailed description of how words have to be entered]
You'll get the following result:
Now you can use the keys 'u' and 'n' to move up and down in this evolution. The two characters ">...<" indicate the active word. At the bottom of the screen, you can get information about the actual evolution step (you get detailed information about the word and the rule used in the calculation).

As you can see, the same rule has been used twice in our example. In fact, a rule is applied until it does not change the word any more.

Now press 'e' to get back to the editor. Change the rule e > a into e > e and start the calculation again (CTRL-F9). You get

As you can see, the rule is applied only once, since it does not change the word. Note that in general, you shouldn't write rules like that unless for multiple evolutions (see below). The problem with rules of this type is the phenomenon called "feeding". You can observe it by changing the rule to e > ee (go back to editor, change rule and start calculation again). You'll get something like:
In this example, the output of the rule becomes a new input for the same rule. Therefore, the rule could be applied ad infinitum (the program, however, stops the calculation after a certain number of iterations for security reasons). As you can see, from the 9th step on the word has become too long for the variable to be stored (the program limits the length of a word to 15 phonems).

So, rules that present an "auto-feeding" (i.e. the rule "feeds itself") are simply not allowed. Now, it might of course be necessary to formulate a rule that converts 'e' into 'ee'. In this case you must reformulate the rule as

CeC > CeeC

If you try the same calculation again, you'll get
Now, let's make things a little bit more complicated. In general, "rescription tools" (thats that technical term for what the program does) typically convert an input A into an output B. In other words: normally, there is only ONE output B which can be produced from an input A. In our program, however, we considered that this was not very reasonable: in fact, there a certain combinations of sounds whose evolution may have two or more evolutions (as an example, we could mention the words "hominem" and "dominum" that become "hombre" and "dueño": as we can see, the secondary group m'n can either be assimilated (> nn) and then palatalized (>ñ) or disimilated (> mr) and then an epenthetic consonant b is inserted). So, the program offers the possibility to formulate several consequences which leads to what we call "multiple evolution".

In order to come back to our example: replace now the old rule by the following two new ones:

a > { e, a, o }

a > e

When you start the calculation, you'll maybe understand why we consider that an computer is a very useful instrument in diachronic linguistics... In fact, a word that simple as "#mensem#" plus two rules (which are not very complicated neither) can produce a result that is as complex as the following "evolution tree":

 #mensem#               #mansem#                          #monsem#             
               +----------+-----------+          +-----------+-----------+     
            #mansem#   #mansam#    #mansom#   #monsem#    #monsam#    #monsom# 
               |          |           |                      |                 
            #mensem#   #mensam#    #mensom#               #monsem#             
Note that we have in this example a phenomenon we call "trans-feeding": the first rule produces an output ("a") that becomes an input for the second rule.

You can use the keys 'h' and 'j' to go to the left or the right side.

Let's talk now about the way linguistical data is stored in the memory of the computer in order to understand how rules are applied to words.

When we have a word like "#mensa#", each phoneme (i.e. m - e - n - s - a) is translated into a "one-dimensional vector". Don't worry: a one-dimensional vector is nothing but a series of values. Each element of the vector represents a feature group: if you have a feature group (in the @deffeatures - @endfeatures part), for example

1 : type = { voc, cons }
the computer gives it a number (say 4). After that, it also gives a number to each element of the group (for example voc = 1, cons = 2). Now every feature can be represented by a pair of values: for example, the feature "voc" would be (4,1), the feature "cons" = (4,2).

A good way to understand this mechanism, is to have a look at the file DATA.LOG (just load it with the editor) which is automatically generated by the program. In our example, this file contains the following information:

[ 3]: 1 : type := { 1-cons 2-voy }
[ 4]: 2 : art := { 1-alv 2-bil }
[ 5]: 2 : type := { 1-nas 2-fric 3-ocl }
[ 6]: 2 : art := { 1-post 2-cent 3-ant }
[ 7]: 2 : art := { 1-bas 2-moy 3-haut }
[ 8]: 2 : apert := { 1-ferm 2-ouv }
You can see the group numbers in the feature definitions (values between []) and the element numbers (connected with the feature name by '-').

As we've said, each element of a phonem-vector represents a feature group. That means that if the phonem contains for example the feature "voc" the 4th element (= group number) of the vector is set to 1 (= element number). In our example "#mensa#" the features "voc" and "cons" are stored in the following way:

m = { 0, 0, 0, 2, ... }
e = { 0, 0, 0, 1, ... }
n = { 0, 0, 0, 2, ... }
s = { 0, 0, 0, 2, ... }
a = { 0, 0, 0, 1, ... }
Note that the value 0 means that no feature of the corresponding group is set.

Now we have talked about the phonems, but we haven't talked yet about the character '#'. The character '#' belongs to a group of three characters, that stand for frontiers:

|      stands for a frontier of syllables position 1
+        "     "  "    "     "  morphem      "     2
#        "     "  "    "     "  word         "     3
A frontier is nothing but a predefined feature that - like any other feature - can be combined with a phonem. The position number corresponds to the group number. The element number of |, + and # is 1. So, | corresponds to the feature (1, 1), + is (2, 1) and # = (3, 1).

If we have the word "#men|sam#" it would be stored

m = { 0, 0, 0, 2, ... }
e = { 0, 0, 0, 1, ... }
n = { 1, 0, 0, 2, ... } frontier |
s = { 0, 0, 0, 2, ... }
a = { 0, 0, 1, 1, ... } frontier #
So, when a phonem has a contains a frontier feature, it means that the frontier comes immediately AFTER the sound.

Now, you may ask what we do with the first # in "#mensam#"... Due to the fact, that a frontier feature that is combined with a phonem means that the frontier is after the phonem, we can not combine it with the first phonem of the word.

To solve this problem, we introduced what we call a "dummy sound" at the beginning of the word. For certain reasons that we're going to explain later on, this "dummy sound" is represented as "@". Now we have all the necessary information to know how the word "#men|sam#" is translated by the program:

@ = { 0, 0, 1, 0, ... } frontier #   -> dummy sound
m = { 0, 0, 0, 2, ... }
e = { 0, 0, 0, 1, ... }
n = { 1, 0, 0, 2, ... } frontier |
s = { 0, 0, 0, 2, ... }
a = { 0, 0, 1, 1, ... } frontier #
So far so good. Now, we know what the computer does do with the words. Now, lets see what happens when a computer gets a rule like e > a.

Like the words, the computer first translates the rules into an internal data structure. It is not exactly the same as for words, but it is quite similar. In order to understand it, however, we must first talk about the definition of the phonems.

As we've already said, each phonem (litteral) is defined by a list of features, that can be described by pairs of two values. So, the sound 'e' for example, defined by [+voc, +ant, +moy] is translated into the following list:

e := [(7,2)(6,3)(3,2)] (7,2)="moy",(6,3)="ant",(3,2)="voc"

=> Note that the computer inverts the order of the features (this is not important in this case).

Now, whenever an "e" stands in the condition of a rule the computer does not simply compare characters with characters but it compares the list of feature values (the pairs) with the phonem vectors described above. In other words, the condition "e" is true, wenn the phonem vector p contains at least the following three elements:

p = { x, x, x, 2, x, x, 3, x, 2, x, x, ..., x }
"x" means here that there can stand ANY value. (Note the the computer starts counting the groups at 0, so that the feature (3,2) stands at the forth position).

The computer does the same with the symbols. The symbol "V" for example is translated into the following list of pairs:

V := [(3,2)]

Therefore, "V" matches for any phonem that has a vector of the type

p = { x, x, x, 2, x, x, x, x, x, x, x, ... , x }

But, in addition to litterals, symbols can contain negated features. A negated feature is the same as it's non negated equivalent with the only difference that the element number is negativ (the group number however is positive). In our example, we have defined the symbol "O" which stands for "oral consonants". This means that it must be a consonant "+cons" and it may not be a nasal "-nas". If we look at the DATA.LOG file, we can see that the computer has translated the negated feature into the following:

O := [(5,-1)(3,1)]

The values (5,1) correspond to "+nas", (5, -1) means "-nas". "O" matches on a phonem vector of the following type:

p = { x, x, x, 1, x, !1, x, x, x, ... x }

!1 means that if there stands a value 1 at the 5th (6th) position, the condition is not true.

But let's be a little bit more practical again! Suppose that we want to convert "#mensa#" into "#mesa#". We could write the following rule: ns > s. If we enter now "#mensa#" we get, indeed, the word "#mesa#" - which looks quite good. Now let's enter "#men|sa#" (don't forget the |). The result is "#mesa#". As we can see, the rule has deleted the |-frontier. Of course, we would like the computer not to delete this frontier. So, how can we do that?

The solution of that problem is the "empty sound" (which we represent by an ampersand (&)). Now, what is an empty sound? Let me put it the following way: an empty sound is so empty that it hardly sounds... but it is not that empty as that it could not be considered as a sound... ;-)) In fact, an empty sound cannot contain any feature excepting the predefined frontier features (|+#). Note that an empty sound coincides with the preceeding sound.

Now, you have to know one more thing to understand how we can solve the frontier problem. When we have a condition with several "items" and a consequence with several "items" too, each item of the consequence must be assigned to an item of the condition. In the rule


there is a vowel that falls out. The question now is: which one falls out? In this case, it would be the 3rd vowel because the computer assignes the symbols automatically from left to right:

Condition:      V V V
                | | -
Consequence:    V V
However, if you'd like the second vowel to fall out you can use numbers between 1 and 9 to "mark" the sound.

V1 V2 V3 > V1 V3 (or simply: V1 V V2 > V1 V2)

Coming back to our example from above: we can use a marker to preserve the |-frontier in "#men|sa#" by writing:

n1s > &1s

[Note: probably, you won't be ably to type an '&' in the editor (no idea why it does do that..., really). Now, you must now that even if a character is not available on your keyboard you can enter it by pressing ALT and entering the ASCII-Code of the character (the ASCII-Code is the number of the character, it must be between 0 and 255). So, if you want to type an 'a' you can hold down the ALT-key and type the number 97 (use the NUM-Pad, not the normal numbers!). When you release the ALT-key, the character appears. So, you can enter '&' by pressing ALT and entering 38. If you dont know the ASCII-Code of a character you can go to the menu "Windows" -> "ASCII_tab". This will open a window that shows all the ASCII-Codes your computer can display. Note that you can switch between the ASCII_tab window and the editor window by pressing F6). The ALT + number methode is very useful when you want to enter phonems that you can't find on your keyboard (like 'ß' or '?' for example).]

How does this rule work? The computer first test the condition which matches on "n|s" in "#men|sa#". Then, it cuts the word into three parts: a left one (#me), a middle one (n|s) and a right one (a#). Now, the middle part must be rebuilt. In order to do that, the computer has assigned the items of the rule in the following way:

condition:     n1 s 
               |  |
consequence:   &1 s
It now takes the first item of the consequence (i.e. the empty sound). This sound, as it is assigned to "n", should "inherit" all the features of "n". But as it is empty this is not possible excepting for the three predefined features | + and #. As we've said above, the empty sound coincides with the preceeding sound, i.e. "e" in our example. Therefore, any frontier contained by "n" is written to "e" (passing through the empty sound).

Now, suppose you have a word like "#ma|ne|+sa#" and you want to make the "e" fall out. It is evident, that in this case, the |-frontier must fall out. We could write a rule e > &, which transforms the word into "#ma|nsa#" (in this case, as & is not assigned to e (because we dont use a marker and the two characters are different), it is a really empty sound, i.e. it contains really nothing). Unfortunately, the +-frontier is deleted at the same time. The question now is: How can we write a rule that deletes the |-frontier but does not do anything to the +-frontier.

Since we want to preserve the +-frontier, we must assign the empty sound & to "e" by using a marker.

e1 > &1

This converts "#ma|ne|+sa#" into "#ma|n|+sa#". Now we must delete the |-frontier. We can do that by negating the frontier in the consequence:

e1 > &1!| (! = negation)

[Of course, it would be more logic not to delete the |-frontier after n, but the one after a. We will see how we can do that later on.]

Empty sounds are quite funny because they allow somewhat "strange" combinations: for example, you can use several empty sounds followed by each other. The rule

e1n2s3a4 > e1&2&3&4

for example would transform "#men|s+a#" to "#ma|+#" (the "e" accumulates the frontiers of the other sounds that fall out).

You can also use empty sounds in the condition: V&[+ton] for example is equivalent to V[+ton]. In this case, it doesn't make much sense of course to use the empty sound. It is very useful however in combination with the jumpers.

A jumper is a character that stands for none, one or several sounds, which can be "jumped". There are to types of jumpers: the star-jumper (*) and the point-jumper (.). The difference is that the star-jumper can jump anything, whereas the point-jumper can not pass over |-frontiers. Let's make a practical example.

If you write a rule e*a > a*e and try it with the word "#men|sam#" you'll get "#man|sem#". If you change the rule to e.a > a.e, however, the result is not the same: the condition does not correspond to the word because the |-frontier after n "blocks" the jumper.

Now, what is this good for? You can use the point-jumper to determine "syllabic" position of a phoneme in a word. If we had a word like "#ma|ni|sa#" that becomes "#ma|nsa#" me must be able to express that the vowel that falls out must be the second one in a three syllabic word. We can do that by the following rule:

@#.V1.&|.V2.&|.V3.&# > @.V1.&.&2.&.V3.&

[If you can't enter the '@', try ALT-6-4]

Note that in general, every element of the left side (condition) must be present on the right side (consequence) if you don't want it to fall out. This is also true for the dummy sound "@", the empty sound and for jumpers.

Now lets try to understand this rule: the "@" at the beginning of the rule indicates that the condition can only be true, if it starts with the dummy sound, i.e. at the very beginning of the word. This sound must contain a #-frontier which marks the beginning of the word. Afterwards, there can be 0, 1 or more sounds of any type (here we have exactly 1, namely "m"), followed by a vowel (here the first "a"). Now comes the interesting part of the rule: there can be 0, 1 or several sounds of any type followed by an emtpy sound with a |-frontier. Since a jumper can be 0 sounds and since the empty sound can coincide with the preceeding sound, the first possibility to have a valid |-frontier is immediately after "a". This is true here and so the third jumper starts after this |-frontier. So, no jumper has passed a |-frontier till now so that the condition can still be true if the rest of it is true. The rest of the rule works in the same way.

I hope this short introduction may have given you a brief overview over the capabilities of the program. Now, read the file RULES.TXT to see more examples of rules.