INVESTIGATION OF THE FEASIBILITY OF CONSTRUCTING A MAP FOR COCONUT WITH SEVERAL F2 FAMILIES USING COMPUTER-SIMULATED DATA

A computer simulation was performed using RiceSim computer software to explore the practicability of combining several different F2 populations together through JoinMap to mimic the real available coconut mapping populations, and found that it was very successful. JoinMap would be able to map all 16 chromosomes which covered the map length of 1540 cM except for a single marker on chromosome 8. The largest marker interval was 32 cM at the bottom of chromosome 3 and all other markers were evenly distributed along the chromosomes maintaining the space around 12-30 cM between them. 1/ Coconut Research Institute, Lunuwila, Sri Lanka.


INTRODUCTION
Coconuts consist mainly of two varieties, Tall and Dwarf, and they are genetically distinct for several important characters due to their different pollination behaviour. Tall type coconuts are generally cross pollinators while Dwarf type coconuts show mostly self-pollination. Hence, dwarf type coconuts are normally homozygous for many loci while tall type coconuts are highly heterozygous.
Breeding coconuts for high productivity and better adaptability to drought, pests and diseases is a major research priority in most of the coconut growing countries. For a long time, coconut breeding involved testing of inter-varietal and intra-varietal hybrids arising from morphologically distinct populations. However, the success of conventional breeding procedures is constrained by several palm characters such as the long juvenile period, outcrossing and heterozygous nature. In addition to that, the unavailability of a viable vegetative propagation mechanism and also the low genetic variability has caused many problems to the breeder.
More recently utilization of molecular markers to enhance the coconut breeding work was initiated. Construction of a genetic map saturated with molecular markers would allow selection for characters to be carried out much more efficiently and effectively than conventional methods.
The most critical decision in constructing a linkage map with DNA markers is the mapping population. A mapping population should be fairly large to contain all genetic information from many segregating gametes, but the currently available coconut populations are rather small to use for a satisfactory mapping programme. The reason for the small size of families is the production of limited number of seeds (nuts) from a particular mother palm within a fixed period of time. This is further aggravated by the low rate of success in artificial pollination in coconut. The development of inbred lines from heterozygous palms is almost impossible because of the long time taken for seed production in coconut.
Within all these constraints, the only available populations in coconuts are full-sib families with very small family sizes, half-sib families with few members obtained from controlled pollination using pooled pollen and several small F 2 families. If there is a possibility to merge several different small F 2 families together, a mapping population of practical size can be produced.
Therefore it is advantageous, if we can combine small F 2 families for analysis coming from several parents that are genetically similar.
For most mapping projects, the most widely used genetic mapping software is Mapmaker (Lander et al., 1987). But the main disadvantage of that software is its inability to merge maps. JoinMap 2.0 (Stam, 1993) has been developed as an alternative to facilitate integration of genetic maps. When using JoinMap, at least some segregating markers must, obviously, be common to maps to be integrated. Integrated maps have already been produced for Arabidopsis (Hauge et. al., 1993), Barley (Qi et. al., 1996) and Brassica oleracea (Sebastian et al, 2000), where few maps were integrated and markers were common to at least two independent data sets. But considering the small F 2 families, it is worthwhile to merge several maps to maximize the population size. In order to explore the reliability of this approach, it was decided to use a simulation study.
Computer based simulation studies can be used to simulate situations that are difficult to explore in practice. It has allowed for tentative interpretation of relatively complex genetic comparisons that have not been previously possible (Edwards & Page, 1994). But there were few occasions in the past where the results obtained from such simulations investigate new pathways possible for some studies (Crosby, 1973;Sampson, 1984). Therefore simulations can be used as follows.
-To avoid commitment of scarce resources. In this case, even when it cannot replace any part of the actual laboratory work, simulation can often identify the most promising solutions and channel research into those paths most likely to succeed. -An alternative for experiments those are impossible, too dangerous, or too costly to perform in the laboratory. -Investigators control over the inputs to the system that may not be possible in the laboratory situation.
-To make predictions testable in the laboratory or suggest refinements for experiments. This case is very important for coconut and it is better to have results similar to these that might be gained from real situations because coconut has rather complex genome.
On top of the above facts, the objective of this study was to test the feasibility of constructing a map for coconut joining nine F 2 populations, exactly similar to the real situation, using computer-simulated data. The results obtained would inform the design of the laboratory experiment to be carried out for mapping coconut genome.

Mapping population and markers
The actual coconut mapping population, for which seeds had been produced in Sri Lanka, consists of nine different F 2 families coming from selfpollination of nine different hybrid (Tall x Dwarf F 1 ) palms. To obtain this mapping population, it is needed to simulate P 1 (highly heterozygous tall coconut) and P 2 (highly homozygous dwarf coconut) and, from them, to simulate a large F 1 family. The parental lines were produced using RiceSim computer software, written in Fortran by Prof. M.J. Kearsey, School of Biosciences, University of Birmingham.
A population of size 1000 in linkage equilibrium was first simulated under the following assumptions.
-Each marker had 2 alleles of equal frequency.
Nine individuals were randomly selected from that large population to represent the nine possible F 1 hybrid plants. When selfing these randomly selected F 1 individuals to produce F 2 populations, free recombination was hypothesised between chromosomes (Recombination Frequency (RF) = 50%), but RF was 20% between each pair of adjacent loci. Each F 1 was selfed to produce F 2 populations having family sizes ranging from 40 to 80 (Table 2) to mimic the real families to be used for coconut genome mapping.

Setting up a map using JoinMap software
All F 2 populations were examined separately to produce data files for the JoinMap software programme. Different, separate locus genotype files were developed for each F 2 family using the notations displayed in Table 3. When 2 gametes were the same, those individuals were entered as homozygotes (AA for gamete type 1; BB for gamete type 2) and at times where 2 gametes are different those individuals were entered as heterozygotes (AB).
Finally, each locus genotype file consisted of 3 different genotypes such as A (homozygous for female parent or AA), B (homozygous for male parent or BB) and H (heterozygous for both parents or AB).
Having produced correct locus genotype files, the next step in JoinMap programme is to assign markers to groups based on temporary pair-wise data files. Since JoinMap programme was unable to perform this, the loci were grouped manually by neglecting monomorphic markers from each F 2 population separately. All other successive steps of JoinMap, such as splitting each linkage group to produce ordered linkage groups and finding recombination frequencies between each pair of markers to produce pair-wise data (PWD) files, were run separately for each F 2 until the PWD files formed. Nine PWD files were brought together within each group between nine F 2 populations and mapping was performed using joined PWD files for each linkage group.

RESULTS AND DISCUSSION
The JoinMap program was allowed to assign markers to linkage groups but it was unable to group them properly because the number of genes read from the locus genotype file was not equal to the number of genes read from temporary pair-wise data file due to some monomorphic markers. Therefore, grouping was performed manually, d eleting monomorphic markers. The linkage groups of each F 2 family are illustrated in Table 2. When considering linkage group 1 of family F 2 (1), only 3 markers (M1, M2 and M6) were polymorphic out of six assigned as in Table 1. According to  Table 2, all six markers of linkage group 1 were polymorphic in F 2 (2), F 2 (4), F 2 (8) and F 2 (9) but it was completely different in F 2 (5) and F 2 (6) having no polymorphic markers at all. The distribution of polymorphic markers between other linkage groups in each F 2 family is illustrated in Table  2.
Homozygous loci have occurred in F 1 individuals due to similar type of gametes received from grand parents at some loci (Table 3a).
If we take gamete 1 that would have come from the grandmother and gamete 2 from grandfather, 3 different genotypes could be distinguished (Table 3b) in F 1 as AA (gamete 1=1 and gamete 2=1), BB (gamete 1=2 and gamete 2=2) and AB (gamete 1=1 and gamete 2=2). Considering two gametes in each F 1 , it is easy to identify which markers will be monomorphic (AA or BB) and polymorphic (AB) in each F 2 . According to Table 3, marker3 and marker4 would be monomorphic in F 2 coming from F 1 (1) because of the same gametes and thereby presence of homozygous loci in F 1 itself. Therefore those monomorphic markers would be unavailable for mapping at this stage because there is no variability. Because we used a population in linkage equilibrium for our F 1 's with equal allele frequencies, almost half of the markers were monomorphic in each F 2 family.
JoinMap was unable to produce groups because it read only the polymorphic markers. Thus 16 groups were built manually deleting monomorphic markers so some groups were not accessible under some F 2 populations as illustrated in Table 2 (ex: groups 3, 11, 13 & 16 in F 2 (1) population). Missing groups have resulted from all markers being monomorphic in a particular chromosome. The condition would become worse when splitting and creating ordered groups. At this stage, there should be at least two markers on a group to retain it for further steps because no pair-wise interactions could be investigated with one marker per chromosome. Based on group files (Table 2), JoinMap did the splitting and then some groups disappeared when there was only one marker per group. Therefore, groups 4, 6, 9 and 12 will be lost in ordered groups of F 2 (1) ( Table 4) in addition to those groups already missing, namely 3, 11, 13 and 16 in Table 2.
As a result of that, only about 8-10 linkage groups were available in each F 2 for further mapping as shown in Table 4. Actual PWD files were formed at the next step, JMREC, keeping LOD to a minimum and REC to a maximum. The JMREC was carried out and it defined suspect estimates for each linkage group based on given LOD threshold 0.001 and REC threshold 0.499. Table 5 shows the final linkage map according to the Kosambi mapping function by JMMAP. Based on suspect estimates resulting from JMREC, the final map would be restricted to a limited number of markers that are fully compatible and accurate.
Because the F 2 populations came from Tall x Dwarf hybrids, based on the results of Table 5, JoinMap would be able to map all 16 chromosomes except for one marker on chromosome 8. This covered a map length of 1540 cM. Only one locus (M48) was removed by JoinMap analysis due to conflicts within the linkage groups. The largest marker interval was about 32 cM at the bottom of chromosome 3. All other markers were evenly distributed along the chromosomes maintaining the space around 12-30 cM between them. The sequence o f markers on each chromosome was correct.

Predictions and Refinements based on this study
-Integration of nine small F 2 populations can be done successfully, providing that there are at least six markers per chromosome with equal allele frequencies. This could and should be determined at the outset.
-Identification of markers that are polymorphic in each F 1 is vital before using them for F 2 populations, in order to reduce the wastage of time, energy and money. -It is essential to use many markers to genotype the populations in order to make the process powerful and thereby to develop coverage of all linkage groups. In this case, it is very useful to use AFLP markers in addition to SSR markers to acquire many polymorphic markers for final mapping.