METHODOLOGICAL ANALYSIS OF DISTANCE TRAINING IN THE EUROPEAN UNION
Desmond Keegan
Distance Education International Ltd
Chapter 45
Methodological analysis

Methodological difficulties

The Voctade project was faced with a difficult methodological problem: that of counting and analysing all the EU citizens in 16 national education systems who were enrolled in distance systems.

It is often said that it is difficult to get statistical data on conventional face-to-face training because not all EU governments collect and publish data in the same form on the complex world of vocational and technical and professional training systems.

It has been said that collecting and analysing data on distance vocational, technical and professional training systems would be even harder than collecting data on conventional face-to-face instructor-led systems. A priori this is true, because distance students, in the main, study in the anonymity of their own homes and do not attend the European training centres, colleges, employment and training agencies that conventional training students do.

A methodology was required that would accurately analyse and quantify a complex field spread over the whole of the European Union.

Methodological solution

Fortunately such a methodology is available.

The Delphi method developed by Professor Norman C Dalkey and his associates at the University of Southern California at Los Angeles in the mid to late 1960s, and justified in empirical studies in the 1970s and later, is designed for

In what follows the Delphi methodology is analysed and its suitability for the Voctade project is justified.

Foundations of the Delphi methodology

In a series of major studies Dalkey and his associates at the Cognitive Systems Laboratory at the University of California at Los Angeles (UCLA) set out the foundations of the Delphi methodology which today is accepted as the standard tool for fairly specific research problems, for which there are a number of persons who are knowledgeable but where it is impossible to get anyone or any group, who knows the whole research problem.

Dalkey and his associates established the methodology in a series of publications among which the most important are:

Dalkey, N (1969) An experimental study of group opinion, Futures 1,5,
408-426

Dalkey (N) (1976) Group decision analysis, in Zelev M (ed) Multi criteria decision making. Amsterdam: Springer

Dalkey N (1979) Group decision theory. UCLA-ENG-7749

Dalkey N (1980) The aggregation of probability estimates. UCLA-ENG-CSL-8025

Dalkey, N et al (1972) Studies in the quality of life. Lexington, Mass : Heath

Linton, Hand, Turiff, M (eds) (1975) The Delphi method. Reading, Mass.: Addison Wesley

Dalkey, N, Brown, B, Cochran S, The Delphi Method III. Use of self-ratings to improve group estimates. RM-6115-PR (Rand Corporation)

Dalkey, N. Analyses from a group opinion study. Futures 1,6,541-551

In 1970 Dalkey and his co-workers, Brown and Cochran, presented the Delphi methodology thus:

The Delphi method is a set of procedures for formulating a group judgement for subject matter where precise information is lacking. In general, the procedures consist of obtaining individual answers to pre-formulated questions either by questionnaire or by some other formal communication technique; iterating the questionnaire one or more times where the information feedback between rounds is carefully controlled by the exercise manager; taking as the group response a statistical aggregate of the final answers.

In previous studies it has been shown that the Delphi procedures led to increased accuracy of group responses more often than not, and that both the spread of answers (standard deviation of responses on a given question) and a self-rating index (average of individual self-ratings on a given question) are valid indicators of the mean accuracy of group responses.

In a major article in 1964 Dalkey had presented the methodology:

The Delphi technique is a method of eliciting and refining group judgements. The rationale for the procedures is primarily the age-old adage 'Two heads are better than one', when the issue is one where exact knowledge is not available. The procedures have three features:

Anonymous response - opinions of members of the group are obtained by formal questionnaire.

Iteration and controlled feedback - interaction is effected by a systematic exercise conducted in several iterations, with carefully controlled feedback between rounds.

Statistical group response - the group opinion is defined as an appropriate aggregate of individual opinions on the final round. These features are designed to minimise the biasing effects of dominant individuals, or irrelevant communications, and of group pressure towards conformity.

In the spring of 1968, a series of experiments were initiated at RAND to evaluate the procedures. The experiments are also designed to explore the nature of the information processes occurring in the Delphi interaction. The experiments were conducted using upper-class and graduate students from UCLA as subjects and general information of the almanac type as subject matter. Ten experiments, involving 14 groups ranging in size from 11 to 30 members, were conducted. About 13,000 answers to some 350 questions were obtained.

The two basic issues being examined were a comparison of face-to-face discussion with the controlled feedback interaction, and a thorough evaluation of controlled feedback as a technique for improving group estimates. The results indicated that, more often than not, face-to-face discussion tended to make the group estimates less accurate, whereas, more often than not, the anonymous controlled feedback procedure made the group estimates more accurate. The experiments thus put the application of Delphi techniques in areas of partial information on much firmer ground.

Of greater long-range significance is the insight gained into the nature of the group information processes. Delphi procedures create a well-defined process that can be described quantitatively. In particular, the average error on round one is a linear function of the dispersion of the answers. The average amount of change of opinion between round one and round two is a well-behaved function of two parameters - the distance of the first-round answer from the group median, and the distance from the true answer.

In a series of telephone discussions in 1996 Professor Dalkey, now Professor Emeritus of Cognitive Systems at the University of California at Los Angeles, confirmed the suitability of the Delphi methodology for the Voctade project.

The Delphi methodology

If, as the age old adage suggests, 'Two heads are better than one', then logically 'n' heads should at least be better than two (certainly no worse). And so the science of opinion is given recognition by the Delphi method. It utilises the systematic manipulation of expert opinion to advance the areas of group problem solving to a new scientific height.

The Delphi method is a set of procedures used for formulating factual group judgements in the areas of management forecasting, broad or long range policy creation and in the prediction of logical events.

It is a technique for eliciting and refining the group judgements of recognised experts in situations where exact knowledge is not available but where a rich set of partial information is available at the disposal of the group on an individual basis.

Thus although the experts might not know the answer, they do have other relevant information that enables them to make estimates and the cognitive route from 'other relevant information' to an estimate is neither immediate nor direct thereby ensuring that the exercise remains in the realm of opinion.

The Delphi theory

In general the Delphi procedure consists of obtaining individual answers to pre-formulated questions, either by questionnaire or by some other formal (traditional) technique, iterating the questionnaire one or more times where the information fed back between rounds is carefully controlled by the exercise manager and taking as the 'group' response some suitable summary statistical function of the individual responses on the final round.

Clearly the systematic and controlled interactive form of a Delphi exercise is in marked contrast to the informal and 'loose' interaction of traditional round the table face-to-face discussions. The work of Dalkey and others has been concerned with the validation of the Delphi technique and its improvement by identifying aspects of its structure that can be scientifically modelled and/or optimised. They have broken down the procedure into three separate functional parts as follows:

Anonymous response

Opinions of members of the group are obtained by the questionnaire or other formal communication channels such as on-line computer communication that is increasingly available today.

Iteration and controlled feedback

Interaction is effected by a systematic exercise conducted in several iterations (rounds) with carefully controlled feedback between rounds whereby a summary of the results of the previous round are conveyed back to the participants.

Statistical group response

The group opinion is defined as the appropriate statistical aggregate of individual responses (opinions) on the final round.

The traditional way of pooling individual opinions within a group is by face-to-face discussions. Numerous studies by psychologists in the preceding two decades to Dalkey's work, circa 1969, had demonstrated some serious difficulties with this type of interaction. Among the most serious are:

Influence of dominant individuals

The group is highly influenced for example by the person who talks the most. There is very little correlation between pressure of speech and knowledge.

Noise

By noise is not meant the prevailing auditory level but semantic noise. Much of the communication in a discussion group has to do with individual and group interests, not with problem solving per se. Although this kind of communication may appear problem orientated it is often irrelevant or biasing.

Group pressure for conformity

Group pressure can have the effect of distorting individual judgements.

The features of the Delphi method are designed to minimise these negative effects. In short, Delphi seeks to use group information more effectively. The statistical group response ensures that the opinion of every member of the group is represented in the final response.

As well as being in accord with good statistical practice it is also good for the group morale. Along with the 'scientific' advantages of a Delphi exercise already mentioned, there are several other properties that should be noted. The procedure is, above all, a rapid and relatively efficient way to 'cream the tops of the heads' of a group of knowledgeable people whilst at the same time generally involving much less effort on the part of the respondent.

A Delphi exercise, properly conducted, can be a highly motivating environment for the respondents and the novelty of the feedback can be interesting to all. There is a reassuring air of objectivity to the outcomes whether they be spurious or not. Finally, the anonymity of the exercise allows a sharing of responsibility and releases the respondents' inhibitions.

Practitioners of Delphi have noted that almost always there is a greater acceptance of the results on the part of the group than on any consensus arrived at by more direct forms of interaction. These features are desirable in any group setting especially if the exercise is conducted in the context of policy formulation where group acceptance is an important consideration.

Evaluation of the methodology

Much of the early evaluation of Delphi was conducted by Dalkey and others at the RAND Corporation in California in the spring of 1968 and the main results communicated in his papers. The two basic issues being examined were a comparison of face-to-face discussion (non-systematic) with controlled feedback interaction (Delphi) and a thorough evaluation of controlled feedback as a technique for improving group estimates.

The results indicated that, more often than not, face-to-face discussions tended to make the group estimates less accurate whereas, more often than not, the Delphi procedure lead to increased accuracy in group responses. Also, it was shown that for a given question, by combining the spread of answers (defined as the standard deviation of individual responses/estimates within the group) and a group self-rating index (average of individual self-ratings for a given question), a meaningful estimate of the mean accuracy of a group response could be obtained.

The realisation that Delphi techniques could be applied more and more was a two-stage process. In experiments at RAND it had to be shown that after face-to-face discussion, more often than not, the group response was less accurate than a simple median of the individual estimates without any preceding discussion, in one single interrogative round.

Dalkey gives a heuristic argument for choosing the median as a suitable summary statistic. Consider the case where you have a group of equally qualified experts indistinguishable in a statistical sense i.e. you have no way of asserting that one expert is more knowledgeable than another. Each provides a numerical estimate to a particular question.

It is clear that, independent of the distribution of answers and independent of the location of the true answer, the median of the individual estimates is at least as close to the true value as one half of the group answers. If the range of responses includes the true answer, then, in general the median is closer to the true answer than more than half of the group. In practical situations the range of answers is very likely to include the true answer, in which case the latter assertion is valid. So, much can be gained by the simple arithmetical (statistical) pooling of individual responses.

In order to compare Delphi with traditional methods, Dalkey used the following set-up. His first experiment involved ten volunteer students at the RAND Corporation. These were divided into two groups of five and twenty almanac-type questions (for which the answers were already known) were presented in four blocks of five each using an ABBA test design - A denoting face-to-face discussion and B denoting questionnaire feedback, for one group, with the reverse (BAAB) for the other.

Thus, each group answered ten questions in discussion sessions and ten in individual questionnaire sessions. The experimental design employed is a modified cross-over type design which has specific statistical advantages in terms of inferential power and accuracy. The basic feedback between several rounds was the median and the upper and lower quartiles of the previous round answers. In all, there were four iterations per question.

The general outcome of the experiments can be summarised as follows:

On the initial round a wide spread of individual answers typically occurred:

Considered as an isolated experiment, this result is not statistically significant. The author does not give any details of the analysis done, but a simple non-parametric test (such as the sign test etc) confirms this result (5% level of significance, 95% confidence). However, when this experiment is considered along with several others showing similar outcomes, the results appear more significant.

A second experiment was conducted which investigated the effect of feedback as part of a Delphi procedure with estimates derived from face-to-face discussion. Rounds 1 and 2 were part of a Delphi exercise (i.e. medians and quartiles of the first round fed back on the second round). Round 3 was conducted with small groups where discussion was allowed before individual estimates for each question were given. The subgroups were smaller than the original group size of 5 used in the first experiment.
Change between rounds 1 and 2 
(Delphi method)
Change between rounds 2 and 3 (Discussion)
Change between rounds 1 and 3
(Face-to-face)
More accurate
8
9
11
Same
8
3
0
Less accurate
4
8
9

Table 1. A comparison of accuracy of group medians after controlled feedback and after discussion

It is the author's view that these results are not as clear cut as those from the first experiment. The improvements (difference between more and less accurate) between rounds 1 and 2 is somewhat greater than that between rounds 1 and 3. From this point of view, the overall improvement would have been greater without the discussion.

Outcomes of the experiments

The outcomes of these two experiments are in accord with the results obtained by other researchers. However, these experiments confirm the negative conclusion that discussion does not display an advantage over statistical aggregation and that, more often than not, it leads to a degradation of group estimates.

Thus these results are highly favourable with respect to the comparison of systematic and controlled interaction as against informal interaction.

There is good evidence to assume that the distributions of round 1 and round 2 estimates from a Delphi exercise are quite close to the lognormal distribution and that henceforth a reasonable scaling of individual answers would be a logarithmic transformation.

The distribution of round 2 responses (estimates) shows a shift toward the distribution (lognormal) mean and this in turn represents a convergence of answers toward the group response. To summarise, the change from round 1 to round 2 indicates a large improvement in individual estimates due to iteration.

Much of this change must be attributable to convergence i.e. to individuals whose first round answers were highly divergent from the group median and who improved by moving toward the median on subsequent rounds.

In fact further analysis shows that the average error on round 1 is a linear function of the dispersion (spread) of the answers and that the average amount of change of opinion between round 1 and 2 is a well-behaved function of two parameters, namely the distance of the first round answer from the group median and the distance from the true answer. The likelihood of a change of estimate is very nearly a linear function of the distance from the median.

It must be noted that the results above mainly concern the estimates of individual respondents. When the group responses (defined as the final round median of the individual responses) are analysed, the picture is the same but with significant differences in degree of the effect.

Data from the 3 experiments described above were used to generate distributions of first and second round group estimates for 287 questions. The table below shows the data on changes with regard to individual questions presented to the groups.
Improvement with iteration and feedback
More accurate
89
Same
80
Less accurate
51
Total number of questions
220

Table 2. Data on changes with regard to individual questions presented to the groups

For about 64% of the changed estimates (89/(89+51)), the median improved in accuracy and for 36% (51/(89+51)) the median became less accurate. Perhaps, more impressively, in none of the eleven groups represented in the above table, did the number of decreases in accuracy exceed the number of increases.

This result alone furnishes the basis for presuming the Delphi method to be useful. From an inspection of the cumulative distributions of group errors on rounds 1 and 2, that on the second round there is a higher proportion of answers with lower errors.

This is equivalent to saying that the second round answers are better than the first round answers. In short, the interaction step brought about an improvement in accuracy which was accompanied by convergence. These two forces at work show the level of success with which the Delphi mechanism can be dismantled, scientifically modelled and understood.

Application to the Voctade project

In 1969 Dalkey presented the application of his procedure in these terms:

In addition to questioning the effects on free expression of opinion and group acceptance, it still must be asked whether the use of iteration and controlled feedback have anything to offer over the 'mere' statistical aggregation of opinions. In the area of opinion much can be gained by the simple arithmetical pooling of individual opinions as shown above. To get some measure of the value of the procedures, and also to obtain, as a basis for improving the procedures, some insight into the information processes that occur in a Delphi exercise, a rather extensive series of experiments was undertaken at RAND starting in the spring in 1968.

The participants were paid. Typical of the almanac type of questions were:

'How many telephones were in use in Africa in 1965?'
'How many suicides were reported in the USA in 1967?'
'How many women marines were there at the end of World War II?'

This type of material was selected for a variety of reasons:
Questions were wanted where the subjects did not know the answer but had sufficient background information to make an informed estimate.
Questions were wanted where there was verifiable answer to check the performance of individuals and groups.
Questions were wanted with numerical answers so a reasonably wide range of performance could be scaled.

A question that must also be considered is whether results obtained with this very restricted type of subject matter apply to other kinds of material; but the general-information type of question used had many of the features ascribable to opinion: namely, the subjects did not know the answer, they did have other relevant information that enabled them to make estimates, and the route from 'other relevant information' to an estimate was neither immediate nor direct.

In 1996 he communicated to the Voctade teams:

As of yet I have not found a practical way to identify the individual information system underlying an expert's judgement, and thus the demonstration that a group judgement can be better than that of any single expert can only be taken as 'reassurance.' In practice, where the only available data are the individual judgements (plus self-ratings), the empirical results, and the elementary fact that the error of the mean is always less than or equal to the average error of the individuals (equality only when there is complete unanimity) appear sufficient to justify using a group judgement.

See description. D

Use in the Voctade report

Although further work has been carried out into the understanding of Delphi by others since Dalkey's research began, for the purposes of the Voctade project the results described above are sufficient motivation to justify its use.

However, there are some differences in procedure that are being incorporated into the Voctade project. Firstly, it is intended not to feed back the median estimate but to generate a 'weighted' average of the individual estimates i.e. experts will not be assumed to be equally qualified in their opinions.

To this end it is worth noting that as regards the exact number of iterations (rounds) necessary to complete a Delphi exercise, there is no pre determinable cut-off point. The examiner must use his or her intuition as to when the best convergence has been reached. Beyond this, an overshoot can occur and any more inputs will have a negative effect causing a degradation in the group estimate.

Secondly, the original body of data collected at RAND involved groups of 20 individuals. For this project groups will be comprised of four experts. Dalkey developed a relationship between group error (average of individual errors) and group size.

The curve indicates a monatomic decreasing relationship - as group size increases, the average group error (distance from the true answer) decreases as would be expected. This is not a problem in itself - it is expected that for a group size of four the Delphi method should be successful, but perhaps not as rapidly as for a group size of 20.

In practice the Voctade use of the Delphi methodology is achieved by dividing the statistics to be established into 64 cells. The statistics are first divided into 15 national groupings with two in Belgium, and each national grouping is divided into the four groupings which research in this field in the period 1976-1996 had shown were characteristic of national distance education systems.

It is accepted that the categories are not totally exclusive and that in theory it is possible to include a structure that might be outside the 4 structures chosen. In practice it is felt that each citizen enrolled in a distance course can, in fact, be allocated to one of the four categories.

The goal was, therefore, to find 4 experts in each of the 16 national systems. It was felt that a grouping of four experts, each of whom was an expert in his or her own section and all of whom were experts on the national scene, was the ideal for establishing accurate statistical and financial data, if such experts could be identified and if they agreed to participate.

Cell analysis

Implementation

The implementation of the Delphi methodology in the Voctade project is achieved by cell analysis.

The data to be collected is divided into cells and Delphi experts are identified for each cell.

Each national system is composed of a cluster of cells. This is because it is a focus of the Voctade project to:

Thus these are 16 national clusters of cells to be studied, as it is best to study Belgium FL and Belgium FR. as two different systems.

As it has been posited above that each student enrolled in an EU distance training programme is considered to be enrolled in one or the categories of the Voctade typology:

it follows that there are 64 cells for statistical and financial analysis and that 64 Delphi experts are needed, four in each country.

The ideal structure is one Delphi expert for each of the four national cells, who is also an expert on the four national cells and can participate in the evaluation of the accuracy of the data for each of the national cells.

Once the Delphi experts have been selected, the methodology proceeds by round after round of data analysis so that the Delphi groups can get ever more precise agreement on the data presented to them.

In May 1997 the Delphi group methodology was widened to the whole of Europe. A penultimate Delphi round was held in April-May 1997 and on the conclusion of the process the agreed statistical data was put up on the World Wide Web in late May 1997 with the invitation to all scholars all over Europe (and the world) to comment on, correct, suggest changes, lacunae or misinterpretations in the period 1 June 1997 to 31 October 1997.

In November 1997 the final Delphi round of the Voctade project was held when the Delphi group evaluated and finalised the statistical data in the light of the Delphi process and the WWW exposure and contribution.

System diversity and case studies

All the 16 EU national systems share the four models analysed above and it is claimed here that the four models encompass all the structures for distance training in the EU in 1997 with these exceptions:
  1. The radio-based and television-based distance training systems chosen for analysis in this report: Teleac in the Netherlands, Funkkolleg in Germany, Telekolleg in Germany, lie outside the four models delineated.
  2. Model 4, the provision of distance education from an ordinary university, has many sub-models in the different EU countries.
The methodology chosen now proceeds to the analysis of individual institutions to show the diversity of national offering, the range and variety of national systems and the way the four models are found in the 16 EU countries.

Clearly it is impossible to analyse each and every institution in the EU that teaches at a distance, and clearly there would be little or no added value in doing so. For that reason a selection of case studies is provided to demonstrate system diversity. There are two other reasons for presenting case studies.

  1. It emphasises again that no harmonisation is intended in the delineation of the four Voctade institutional models and that the focus is on the richness and uniqueness of national provision.
  2. Government planners, educational researchers and the public will gain a concept of the day-to-day running of a distance teaching organisation; many know little of the work and structure of a government distance training college or even of the running of an open university.
Four institutions from each of the four models described above have been chosen for case studies. They are:

Distance teaching universities
1 Universidad de Educación a Distancia, Madrid, Spain
2 Open University of the United Kingdom, Milton Keynes
3 The Hellenic Open University, Patras, Greece
4 Universitat Oberta de Catalunya, Barcelona, Spain

Distance education provision from conventional universities
1 The University of Oulu, Finland
2 Università degli Studi di Roma III, Italy
3 Herriot-Watt University, Edinburgh, Scotland
4 University of Linköpping, Sweden

Government distance education colleges
1. CIDEAD, Madrid, Spain
2 Enseignement à distance de la Communuté Française de Belgique, Brusells, Belgium
3 Bestuur Afstandsonderwijs, Brussels, Belgium
4 Centre National d' Enseignement à Distance, Poitiers, Rheims, Rennes, Grenoble, Lyons, Lille, Toulouse, and Vanves (Paris), France

Proprietary distance education colleges
1 Leidse Onderwijsinstellingen, Leiden, Netherlands
2 Kilroy's College, Dublin, Ireland
3 Deutsche Weiterbildungsesellschaft, Pfungstadt
4 Baltic education, Copenhagen, Denmark

Case studies and analysis of target groups

Distance training is an important area of EU training provision because it is the chosen form of training for nearly 2,000,000 EU citizens per year. It is the normal form of training provision for many citizens who are isolated, for those who are too distant from the institution that provides the particular course they need, for those in full-time employment and for all who cannot meet the time-tabling of lectures, classes, training sessions, practicals or workshop sessions that are a characteristic of other forms of provision. It is the only form of provision for many prisoners, hospitalised, disabled, disadvantaged, shiftworkers and homemakers.

The field, nevertheless, is little known. There has been little analysis of the institutions, of the models, of the patterns of provision, of the courses, of the certification, of the methods used, of the technologies chosen, of the success or failure of students in this form of provision.

The target groups for whom it can be claimed that distance training is the only form of vocational education and training (VET) provision are of particular importance for the Voctade study.

For this reason case studies have been undertaken on three target groups which depend greatly on distance training: