It is often said that it is difficult to get statistical data on conventional face-to-face training because not all EU governments collect and publish data in the same form on the complex world of vocational and technical and professional training systems.
It has been said that collecting and analysing data on distance vocational, technical and professional training systems would be even harder than collecting data on conventional face-to-face instructor-led systems. A priori this is true, because distance students, in the main, study in the anonymity of their own homes and do not attend the European training centres, colleges, employment and training agencies that conventional training students do.
A methodology was required that would accurately analyse and quantify a complex field spread over the whole of the European Union.
The Delphi method developed by Professor Norman C Dalkey and his associates at the University of Southern California at Los Angeles in the mid to late 1960s, and justified in empirical studies in the 1970s and later, is designed for
Dalkey and his associates established the methodology in a series of publications among which the most important are:
Dalkey, N (1969) An experimental study of group opinion,
Futures 1,5,
408-426
Dalkey (N) (1976) Group decision analysis, in Zelev M (ed) Multi criteria decision making. Amsterdam: Springer
Dalkey N (1979) Group decision theory. UCLA-ENG-7749
Dalkey N (1980) The aggregation of probability estimates. UCLA-ENG-CSL-8025
Dalkey, N et al (1972) Studies in the quality of life. Lexington, Mass : Heath
Linton, Hand, Turiff, M (eds) (1975) The Delphi method. Reading, Mass.: Addison Wesley
Dalkey, N, Brown, B, Cochran S, The Delphi Method III. Use of self-ratings to improve group estimates. RM-6115-PR (Rand Corporation)
Dalkey, N. Analyses from a group opinion study. Futures 1,6,541-551
In 1970 Dalkey and his co-workers, Brown and Cochran, presented the Delphi methodology thus:
The Delphi method is a set of procedures for formulating a group judgement for subject matter where precise information is lacking. In general, the procedures consist of obtaining individual answers to pre-formulated questions either by questionnaire or by some other formal communication technique; iterating the questionnaire one or more times where the information feedback between rounds is carefully controlled by the exercise manager; taking as the group response a statistical aggregate of the final answers.
In previous studies it has been shown that the Delphi procedures led to increased accuracy of group responses more often than not, and that both the spread of answers (standard deviation of responses on a given question) and a self-rating index (average of individual self-ratings on a given question) are valid indicators of the mean accuracy of group responses.
In a major article in 1964 Dalkey had presented the methodology:
The Delphi technique is a method of eliciting and refining group judgements. The rationale for the procedures is primarily the age-old adage 'Two heads are better than one', when the issue is one where exact knowledge is not available. The procedures have three features:
Anonymous response - opinions of members of the group are obtained by formal questionnaire.
Iteration and controlled feedback - interaction is effected by a systematic exercise conducted in several iterations, with carefully controlled feedback between rounds.
Statistical group response - the group opinion is defined as an appropriate aggregate of individual opinions on the final round. These features are designed to minimise the biasing effects of dominant individuals, or irrelevant communications, and of group pressure towards conformity.
In the spring of 1968, a series of experiments were initiated at RAND to evaluate the procedures. The experiments are also designed to explore the nature of the information processes occurring in the Delphi interaction. The experiments were conducted using upper-class and graduate students from UCLA as subjects and general information of the almanac type as subject matter. Ten experiments, involving 14 groups ranging in size from 11 to 30 members, were conducted. About 13,000 answers to some 350 questions were obtained.
The two basic issues being examined were a comparison of face-to-face discussion with the controlled feedback interaction, and a thorough evaluation of controlled feedback as a technique for improving group estimates. The results indicated that, more often than not, face-to-face discussion tended to make the group estimates less accurate, whereas, more often than not, the anonymous controlled feedback procedure made the group estimates more accurate. The experiments thus put the application of Delphi techniques in areas of partial information on much firmer ground.
Of greater long-range significance is the insight gained into the nature of the group information processes. Delphi procedures create a well-defined process that can be described quantitatively. In particular, the average error on round one is a linear function of the dispersion of the answers. The average amount of change of opinion between round one and round two is a well-behaved function of two parameters - the distance of the first-round answer from the group median, and the distance from the true answer.
In a series of telephone discussions in 1996 Professor Dalkey, now Professor Emeritus of Cognitive Systems at the University of California at Los Angeles, confirmed the suitability of the Delphi methodology for the Voctade project.
The Delphi method is a set of procedures used for formulating factual group judgements in the areas of management forecasting, broad or long range policy creation and in the prediction of logical events.
It is a technique for eliciting and refining the group judgements of recognised experts in situations where exact knowledge is not available but where a rich set of partial information is available at the disposal of the group on an individual basis.
Thus although the experts might not know the answer, they do have other relevant information that enables them to make estimates and the cognitive route from 'other relevant information' to an estimate is neither immediate nor direct thereby ensuring that the exercise remains in the realm of opinion.
Clearly the systematic and controlled interactive form of a Delphi exercise is in marked contrast to the informal and 'loose' interaction of traditional round the table face-to-face discussions. The work of Dalkey and others has been concerned with the validation of the Delphi technique and its improvement by identifying aspects of its structure that can be scientifically modelled and/or optimised. They have broken down the procedure into three separate functional parts as follows:
The traditional way of pooling individual opinions within a group is by face-to-face discussions. Numerous studies by psychologists in the preceding two decades to Dalkey's work, circa 1969, had demonstrated some serious difficulties with this type of interaction. Among the most serious are:
The features of the Delphi method are designed to minimise these negative effects. In short, Delphi seeks to use group information more effectively. The statistical group response ensures that the opinion of every member of the group is represented in the final response.
As well as being in accord with good statistical practice it is also good for the group morale. Along with the 'scientific' advantages of a Delphi exercise already mentioned, there are several other properties that should be noted. The procedure is, above all, a rapid and relatively efficient way to 'cream the tops of the heads' of a group of knowledgeable people whilst at the same time generally involving much less effort on the part of the respondent.
A Delphi exercise, properly conducted, can be a highly motivating environment for the respondents and the novelty of the feedback can be interesting to all. There is a reassuring air of objectivity to the outcomes whether they be spurious or not. Finally, the anonymity of the exercise allows a sharing of responsibility and releases the respondents' inhibitions.
Practitioners of Delphi have noted that almost always there is a greater acceptance of the results on the part of the group than on any consensus arrived at by more direct forms of interaction. These features are desirable in any group setting especially if the exercise is conducted in the context of policy formulation where group acceptance is an important consideration.
The results indicated that, more often than not, face-to-face discussions tended to make the group estimates less accurate whereas, more often than not, the Delphi procedure lead to increased accuracy in group responses. Also, it was shown that for a given question, by combining the spread of answers (defined as the standard deviation of individual responses/estimates within the group) and a group self-rating index (average of individual self-ratings for a given question), a meaningful estimate of the mean accuracy of a group response could be obtained.
The realisation that Delphi techniques could be applied more and more was a two-stage process. In experiments at RAND it had to be shown that after face-to-face discussion, more often than not, the group response was less accurate than a simple median of the individual estimates without any preceding discussion, in one single interrogative round.
Dalkey gives a heuristic argument for choosing the median as a suitable summary statistic. Consider the case where you have a group of equally qualified experts indistinguishable in a statistical sense i.e. you have no way of asserting that one expert is more knowledgeable than another. Each provides a numerical estimate to a particular question.
It is clear that, independent of the distribution of answers and independent of the location of the true answer, the median of the individual estimates is at least as close to the true value as one half of the group answers. If the range of responses includes the true answer, then, in general the median is closer to the true answer than more than half of the group. In practical situations the range of answers is very likely to include the true answer, in which case the latter assertion is valid. So, much can be gained by the simple arithmetical (statistical) pooling of individual responses.
In order to compare Delphi with traditional methods, Dalkey used the following set-up. His first experiment involved ten volunteer students at the RAND Corporation. These were divided into two groups of five and twenty almanac-type questions (for which the answers were already known) were presented in four blocks of five each using an ABBA test design - A denoting face-to-face discussion and B denoting questionnaire feedback, for one group, with the reverse (BAAB) for the other.
Thus, each group answered ten questions in discussion sessions and ten in individual questionnaire sessions. The experimental design employed is a modified cross-over type design which has specific statistical advantages in terms of inferential power and accuracy. The basic feedback between several rounds was the median and the upper and lower quartiles of the previous round answers. In all, there were four iterations per question.
The general outcome of the experiments can be summarised as follows:
On the initial round a wide spread of individual answers typically occurred:
A second experiment was conducted which investigated the effect of feedback as part of a Delphi procedure with estimates derived from face-to-face discussion. Rounds 1 and 2 were part of a Delphi exercise (i.e. medians and quartiles of the first round fed back on the second round). Round 3 was conducted with small groups where discussion was allowed before individual estimates for each question were given. The subgroups were smaller than the original group size of 5 used in the first experiment.
|
|
|
|
|
| More accurate |
|
|
|
| Same |
|
|
|
| Less accurate |
|
|
|
It is the author's view that these results are not as clear cut as those from the first experiment. The improvements (difference between more and less accurate) between rounds 1 and 2 is somewhat greater than that between rounds 1 and 3. From this point of view, the overall improvement would have been greater without the discussion.
Thus these results are highly favourable with respect to the comparison of systematic and controlled interaction as against informal interaction.
There is good evidence to assume that the distributions of round 1 and round 2 estimates from a Delphi exercise are quite close to the lognormal distribution and that henceforth a reasonable scaling of individual answers would be a logarithmic transformation.
The distribution of round 2 responses (estimates) shows a shift toward the distribution (lognormal) mean and this in turn represents a convergence of answers toward the group response. To summarise, the change from round 1 to round 2 indicates a large improvement in individual estimates due to iteration.
Much of this change must be attributable to convergence i.e. to individuals whose first round answers were highly divergent from the group median and who improved by moving toward the median on subsequent rounds.
In fact further analysis shows that the average error on round 1 is a linear function of the dispersion (spread) of the answers and that the average amount of change of opinion between round 1 and 2 is a well-behaved function of two parameters, namely the distance of the first round answer from the group median and the distance from the true answer. The likelihood of a change of estimate is very nearly a linear function of the distance from the median.
It must be noted that the results above mainly concern the estimates of individual respondents. When the group responses (defined as the final round median of the individual responses) are analysed, the picture is the same but with significant differences in degree of the effect.
Data from the 3 experiments described above were used to generate distributions of first and second round group estimates for 287 questions. The table below shows the data on changes with regard to individual questions presented to the groups.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For about 64% of the changed estimates (89/(89+51)), the median improved in accuracy and for 36% (51/(89+51)) the median became less accurate. Perhaps, more impressively, in none of the eleven groups represented in the above table, did the number of decreases in accuracy exceed the number of increases.
This result alone furnishes the basis for presuming the Delphi method to be useful. From an inspection of the cumulative distributions of group errors on rounds 1 and 2, that on the second round there is a higher proportion of answers with lower errors.
This is equivalent to saying that the second round answers are better than the first round answers. In short, the interaction step brought about an improvement in accuracy which was accompanied by convergence. These two forces at work show the level of success with which the Delphi mechanism can be dismantled, scientifically modelled and understood.
In addition to questioning the effects on free expression of opinion and group acceptance, it still must be asked whether the use of iteration and controlled feedback have anything to offer over the 'mere' statistical aggregation of opinions. In the area of opinion much can be gained by the simple arithmetical pooling of individual opinions as shown above. To get some measure of the value of the procedures, and also to obtain, as a basis for improving the procedures, some insight into the information processes that occur in a Delphi exercise, a rather extensive series of experiments was undertaken at RAND starting in the spring in 1968.
The participants were paid. Typical of the almanac type of questions were:
'How many telephones were in use in Africa in 1965?'
'How many suicides were reported in the USA in 1967?'
'How many women marines were there at the end of World
War II?'
This type of material was selected for a variety of reasons:
Questions were wanted where the subjects did not know
the answer but had sufficient background information to make an informed
estimate.
Questions were wanted where there was verifiable answer
to check the performance of individuals and groups.
Questions were wanted with numerical answers so a reasonably
wide range of performance could be scaled.
A question that must also be considered is whether results obtained with this very restricted type of subject matter apply to other kinds of material; but the general-information type of question used had many of the features ascribable to opinion: namely, the subjects did not know the answer, they did have other relevant information that enabled them to make estimates, and the route from 'other relevant information' to an estimate was neither immediate nor direct.
In 1996 he communicated to the Voctade teams:
As of yet I have not found a practical way to identify the individual information system underlying an expert's judgement, and thus the demonstration that a group judgement can be better than that of any single expert can only be taken as 'reassurance.' In practice, where the only available data are the individual judgements (plus self-ratings), the empirical results, and the elementary fact that the error of the mean is always less than or equal to the average error of the individuals (equality only when there is complete unanimity) appear sufficient to justify using a group judgement.
However, there are some differences in procedure that are being incorporated into the Voctade project. Firstly, it is intended not to feed back the median estimate but to generate a 'weighted' average of the individual estimates i.e. experts will not be assumed to be equally qualified in their opinions.
To this end it is worth noting that as regards the exact number of iterations (rounds) necessary to complete a Delphi exercise, there is no pre determinable cut-off point. The examiner must use his or her intuition as to when the best convergence has been reached. Beyond this, an overshoot can occur and any more inputs will have a negative effect causing a degradation in the group estimate.
Secondly, the original body of data collected at RAND involved groups of 20 individuals. For this project groups will be comprised of four experts. Dalkey developed a relationship between group error (average of individual errors) and group size.
The curve indicates a monatomic decreasing relationship - as group size increases, the average group error (distance from the true answer) decreases as would be expected. This is not a problem in itself - it is expected that for a group size of four the Delphi method should be successful, but perhaps not as rapidly as for a group size of 20.
In practice the Voctade use of the Delphi methodology is achieved by dividing the statistics to be established into 64 cells. The statistics are first divided into 15 national groupings with two in Belgium, and each national grouping is divided into the four groupings which research in this field in the period 1976-1996 had shown were characteristic of national distance education systems.
The goal was, therefore, to find 4 experts in each of the 16 national systems. It was felt that a grouping of four experts, each of whom was an expert in his or her own section and all of whom were experts on the national scene, was the ideal for establishing accurate statistical and financial data, if such experts could be identified and if they agreed to participate.
The data to be collected is divided into cells and Delphi experts are identified for each cell.
Each national system is composed of a cluster of cells. This is because it is a focus of the Voctade project to:
As it has been posited above that each student enrolled in an EU distance training programme is considered to be enrolled in one or the categories of the Voctade typology:
The ideal structure is one Delphi expert for each of the four national cells, who is also an expert on the four national cells and can participate in the evaluation of the accuracy of the data for each of the national cells.
Once the Delphi experts have been selected, the methodology proceeds by round after round of data analysis so that the Delphi groups can get ever more precise agreement on the data presented to them.
In May 1997 the Delphi group methodology was widened to the whole of Europe. A penultimate Delphi round was held in April-May 1997 and on the conclusion of the process the agreed statistical data was put up on the World Wide Web in late May 1997 with the invitation to all scholars all over Europe (and the world) to comment on, correct, suggest changes, lacunae or misinterpretations in the period 1 June 1997 to 31 October 1997.
In November 1997 the final Delphi round of the Voctade project was held when the Delphi group evaluated and finalised the statistical data in the light of the Delphi process and the WWW exposure and contribution.
Clearly it is impossible to analyse each and every institution in the EU that teaches at a distance, and clearly there would be little or no added value in doing so. For that reason a selection of case studies is provided to demonstrate system diversity. There are two other reasons for presenting case studies.
Distance teaching universities
1 Universidad de Educación a Distancia, Madrid,
Spain
2 Open University of the United Kingdom, Milton Keynes
3 The Hellenic Open University, Patras, Greece
4 Universitat Oberta de Catalunya, Barcelona, Spain
Distance education provision from conventional universities
1 The University of Oulu, Finland
2 Università degli Studi di Roma III, Italy
3 Herriot-Watt University, Edinburgh, Scotland
4 University of Linköpping, Sweden
Government distance education colleges
1. CIDEAD, Madrid, Spain
2 Enseignement à distance de la Communuté
Française de Belgique, Brusells, Belgium
3 Bestuur Afstandsonderwijs, Brussels, Belgium
4 Centre National d' Enseignement à Distance,
Poitiers, Rheims, Rennes, Grenoble, Lyons, Lille, Toulouse, and Vanves
(Paris), France
Proprietary distance education colleges
1 Leidse Onderwijsinstellingen, Leiden, Netherlands
2 Kilroy's College, Dublin, Ireland
3 Deutsche Weiterbildungsesellschaft, Pfungstadt
4 Baltic education, Copenhagen, Denmark
The field, nevertheless, is little known. There has been little analysis of the institutions, of the models, of the patterns of provision, of the courses, of the certification, of the methods used, of the technologies chosen, of the success or failure of students in this form of provision.
The target groups for whom it can be claimed that distance training is the only form of vocational education and training (VET) provision are of particular importance for the Voctade study.
For this reason case studies have been undertaken on three target groups which depend greatly on distance training: