It has sometimes happened that research students spend two or more years collecting large amounts of data before considering the problems of analysing that data and with little understanding of their statistical or computing needs. In a disturbing number of cases the proposed analysis has subsequently had to be severely restricted or even abandoned.
The data may be too massive to handle; more often some essential measurements have been not been made, survey questions are ambiguous or otherwise defective, or samples are of insufficient quality or size. Sometimes the standard statistical or numerical methods and the available computer programs are inadequate and the development of suitable techniques would of itself provide a PhD project.
Serious problems can also arise from the use of inappropriate computer software for data analysis. This can lead to results that are demonstrably incorrect or at least suspect, besides risking unfavourable comment from referees and reviewers. For example, statistical procedures provided by spreadsheet and database packages are best treated with caution. Though convenient for commercial use (for which they are chiefly intended), these might not meet the standards required for academic research. They might be based on poor numerical techniques leading to inaccurate statistics, their limitations might be poorly documented if at all, and essential supplementary tests might be omitted. Such packages can be very useful for data entry and management, but for statistical analysis it is always advisable (and often easier) to use software written for the purpose by a reputable specialist manufacturer.
University staff, research workers, and supervisors of research students are urged to ensure that methodological and computing requirements for data analysis are thoroughly evaluated at an early stage in the planning of any relevant project and certainly before any substantial resources are spent in collecting data. If necessary, advice should be sought. This is particularly important in the Arts and Social Sciences, where data are intrinsically very complex.