From Technological Somnambulism to Epistemological Awareness: reflections on the impact of computer-aided qualitative data analysis
While HyperRESEARCH has changed a lot since this unpublished 1997 article by Prof. Paul Trowler of the Department of Educational Research at Lancaster University, the basic tenet about the use and effect of Computer Assisted/Aided Qualitative Data AnalysiS (CAQDAS) in qualitative research still holds. Researchware is proud to present "From Technological Somnambulism to Epistemological Awareness: reflections on the impact of computer-aided qualitative data analysis" with permission of Dr. Trowler. Among the article's many insights is a deft illustration the use of HyperRESEARCH's unique Hypothesis Tester feature. The original article can be found at http://www.lancs.ac.uk/staff/trowler/hyperres.htm and is reproduced below.
This paper uses evidence from a research project to test the competing propositions and conclusions of two viewpoints on the impact of computer aided qualitative data analysis (CAQDAS). One, the convergence model, suggests that CAQDAS contains inherent characteristics which drive data analysis approaches in a consistent direction. The effect of this is to create homogeneity in analytical approaches which is detrimental in a number of ways. The alternative viewpoint suggests that the development of CAQDAS is only one of a number of options for qualitative data analysis but one which offers unique advantages. I refer to this as the repertoire-enhancement model. The proponents of these contrasting models have so far presented little empirical evidence to support their claims. The project drawn on in this paper concerned the attitudes, values and practices of academics towards curriculum change in a single higher education institution and used a code-and-retrieve software package called HyperResearch as one of its analytical strategies. This experience is used to test the claims made at the level of the individual researcher by the competing positions.
Two views on CAQDAS
There is a lively debate in the literature on computer aided qualitative data analysis (CAQDAS) concerning its impact at both the level of the individual researcher and that of the qualitative research community as a whole. For what we can call the ‘convergence’ school of thought CAQDAS contains intrinsic dangers, now being realised, of approaches to qualitative data analysis converging as researchers world-wide adopt a small number of CAQDAS packages. The rich diversity of ‘manual’ analysis approaches that existed in the past is stifled by a technologically determined shift, powered by the sets of assumptions built into the software (Seidel, 1991; Coffey et al, 1996). This is viewed negatively because researchers globally are tending to adopt (software-driven) approaches which are not suitable for their research projects’ aims, subjects and/or data types or which are confined to a simplistic research paradigm. It also has the effect of ‘killing’ the healthy experimentation and growth into diversity in non-computer-based approaches that had occurred naturally to date. In this scenario the words ‘coding’ and ‘analysis’ begin to be used interchangeably at one level, as, at another, do written ‘text analysis’ and ‘qualitative analysis’. The latter is the case because the difficulty the computer has (at least at the moment) in dealing on-line with material other than written text (for example video or graphical data) means that analysts’ focus is inexorably pushed towards the written word. As a result ethnography in particular becomes uni-dimensional both in terms of the variety of data used and the ways these are analysed.
Underpinning this convergence, in this view, is a lack of awareness at the individual researcher level about the impact of CAQDAS on the research approaches and ‘methods of thought’ employed (Walker, 1993, p 92). It is claimed that CAQDAS has a number of effects on the analyst: conditioning how they think about relationship between their data and ‘reality’; what they come to perceive - and miss - in the data; how far they are aware of and interrogate the varieties of ontological statuses their data may contain; how they come to think about the subjects of their research; what they consider to be ‘valuable’ data; their epistemological and ontological standpoints and, in the end, their research results. For example the ‘neat [data] retrieval process’ that computers make possible may lead the individual researcher into thinking that the original data were similarly ‘neat’ and unproblematic rather than messy and contradictory (Richards and Richards, 1991, p 49). Likewise there is a danger that data may be considered out of context, that fragments of ‘text’, the production of which was intimately linked to a particular social and discursive context, may be treated as ‘free-standing’. Indeed, the danger is that the text begins to assume paramount importance rather than the analyst’s interpretation of it. The ethnographic tradition has been to give central importance to the researcher’s interpretation of the data as s/he goes through the process of interpretation and reflection and linkage with the theoretical and conceptual tradition of the discipline within which s/he is working. Having developed a set of codes to apply to the data, the researcher may begin to lose sight of the problematic nature of discourse and the polysemic character of text: the coding procedure simply becomes a mechanical process of assigning one of N codes to a portion of text. As a result the analyst’s ontological perspective becomes simplified and, for example, s/he begins to view interview discourse as the more or less perfect outer articulation to an inner world of thought. By contrast the postmodern and discourse analytic approaches to the production of text in interviews, as elsewhere, see it as a more complex process involving a considerable degree of context-related choice (Potter and Wetherell, 1987).
In the convergence model, then, most qualitative researchers are portrayed as technological somnambulists (Pfaffenberger, 1988), instrumentally choosing an easy tool and unaware of or unconcerned about its effects on them. Coffey et al, for example, write:
"It is too easy for there to develop a taken-for-granted mode of data handling....The categorization of textual data and the use of computer software to search for them appear to render the general approach akin to standardized survey or experimental design procedures. In our view qualitative research is not enhanced by poor imitations of other research styles and traditions....The presuppositions and procedures that are inscribed in contemporary software for qualitative data analysis are implicitly driving a renewed orthodoxy that is being adopted in a large number of research sites around the world". (Coffey et al, 1996, paras 7.6 and 1.4)
The ‘repertoire-enhancement’ theorists, by contrast, adopt a positive viewpoint, asserting that the influence of CAQDAS is benign and arguing that at the individual level qualitative researchers are at least epistemologically aware and often epistemologically sophisticated. Researchers have a number of analytical techniques at their disposal and the development of CAQDAS has simply added to their repertoire. As with any other technique, they will select it on the basis of its appropriateness to the task in hand and confine its use to that task (Weaver and Atkinson, 1994; Lee and Fielding, 1996). Indeed, even restricting themselves to computer-based analytical approaches need not limit researchers as these allow considerable diversity of approach both within a single program and across the range of those available. At the research community level, then, the effect of the development of CAQDAS is a beneficial one, offering a further extension of the tool kit available to the qualitative researcher. The advantages CAQDAS offers over other techniques include its ability to deal with a large amount of qualitative data though the automation of several steps in the process of coding and retrieving data, eliminating the laborious writing, highlighting, photocopying, filing and manual retrieval involved in past practices. Secondly it can assist research teams by ensuring that all collaborators share definitions of concepts and approaches to data collection and manipulation. Thirdly it makes the analysis process more explicit and hence leads to an enhancement of the truth claims of qualitative research which, unlike quantitative approaches, has suffered from general doubts about its reliability and validity (Becker et al, 1984; Gittelsohn, 1992). Finally it can improve the write-up of the research by giving easy access to the data and (with hypertext and hypermedia software) make the finished product more flexible, noon-linear and interactive.
Lee and Fielding exemplify the repertoire-enhancement approach:
"Instead of insisting on a particular model of analysis, we...see computer-based methods as permitting, if we can use a somewhat inelegant term, the multitooling of qualitative researchers, making available to them more or less at will a wide range of different analytic techniques....Let a hundred flowers bloom!" (Lee and Fielding, 1996, paras 2.2 and 4.5)
This paper, then, aims to explore the competing claims made in this debate about the impact of CAQDAS on research at the individual level using my own experience of its use. The protagonists on each side have set out a number of arguments and claims, but neither side has so far presented much evidence in support of their position. The research project I am using to subject their claims to critical analysis was a five-year ethnographic study of academic staff responses to change in a single higher education institution. It employed CAQDAS for data analysis phase, as I outline below.
This exploration of the validity of the claims made by the competing sides in the debate is necessarily delimited: in particular I cannot explore the diversity of approaches available within CAQDAS more generally, nor can I comment on the impact on the research community as a whole, though it is possible to make inferences about this from my comments here. The discussion is also limited to the application of one software package.
My research used four main techniques of data collection: 50 intensive semi-structured interviews with academics; observant participation of events over the five year period; other studies of the institution (which had been much studied) and documents generated within and outside the institution. Of course this generated a tremendous amount of data: well over a megabyte of interview transcripts in ASCII format, some fully transcribed professionally and some partially transcribed and annotated by me, plus a very large amount of printed material, field notes, research memos etc. The sheer quantity of data made systematic analysis procedures extremely attractive. Although a computer enthusiast I was a novice as far as CAQDAS was concerned. My purposes in using CAQDAS were entirely pragmatic and I could, at that stage, have been aptly described as a technological somnambulist. Six years on I am able to reflect in a more informed way about the experience, as Weaver and Atkinson among others urge us to do (Weaver and Atkinson, 1994, p 2 and pp 6-7).
The CAQDAS package I chose was HyperResearch, primarily because it runs on a Macintosh platform, was designed mainly for individual rather than collaborative research (unlike NUDIST) and is able to cope with a large amount of data. It is also user-friendly: it requires very little prior work on the data (unlike the ETHNOGRAPH) and is intuitive in use for someone familiar with the Macintosh or Windows environments.
HyperResearch is, of course, only one of a number of packages available (for a description of others see Tesch, 1990; Fielding, 1991; Miles and Huberman, 1994; Weaver and Atkinson, 1994; Aljunid, S, 1996). If data analysis "consists of three concurrent flows of activity: data reduction, data display, and conclusion drawing/verification" (Miles and Huberman, 1994) then different packages adopt different approaches to and have different strengths in each of these areas. Weaver and Atkinson (1994) identify three different analysis strategies implemented in software: lexical searching; hypertext and code-and-retrieve.
Lexical searching programs such as Key Words in Context (KWIC) search for given words or phrases, or equivalents, in context and offer the researcher the found text in the context in which it was originally produced. This approach is particularly strong in the conclusion drawing/verification area of analysis and also helps to deal with large amounts of data. Hypertext programs such as MARTIN and Guide use the techniques now implemented on the World Wide Web to link documents and passages together and, ideally, use multi-media techniques so that video film, sound and photographs can be used in addition to written text in ethnographic accounts. Particularly useful for data display, this eradicates the somewhat artificial linearity of many ethnographic accounts and empowers the ‘reader’ to take what they want or need from the ‘story’. This approach can also help the researcher to identify patterns in the data which were not immediately apparent, so assisting in the conclusion-drawing aspect of analysis.
HyperResearch by contrast is a code-and-retrieve package. More often than the other types this kind of software is written specifically for data analysis purposes. To use HyperResearch the data for analysis must first be converted into digital form and then ‘coded’. This process involves applying a prescribed or developing set of categories to the data. In HyperResearch it is a technically simple process: the analyst uses the mouse to point and click at the start and end of the passage in the text to be coded and then selects the code to be applied from a scrolling box. New codes are quickly and easily created as necessary. Codes may be of three types, though HyperResearch does not treat them differently: descriptive (for example male); thematic (for example Ideology/New Right), and explanatory (for example Self Presentation/informant, indicating that the respondent was presenting themselves in a particular way in the passage so coded). Having compiled the set of codes and applied them to the data HyperResearch then allows the analyst to do a number of things. These include the ability to:
* retrieve and manipulate all the portions of text ascribed with a particular code and, if required, save the resultant portions of text to a file.
* conduct a Boolean search for all cases of a particular code and/or/not (etc) another code (for example all cases where the code Ideology/New Right and HE Sys/Access/Worries appears in the transcript of an interview with a male academic) and, if required, save the resultant portions of text to a file. This is a way of testing propositions about the data, in this case the idea that male academics who subscribe to New Right ideology will also tend to be antagonistic towards the expansion of access to higher education. This searching process can be made more sophisticated by combining codes with a Boolean operator in brackets, as in the example of rule 4 in figure 1.
* test hypotheses about the overall meaning of the data by developing and testing a set of rules. This involves the researcher developing a hypothesis about the data which takes the form if X then Y. The example used in the HyperResearch manual is from an interview study of young female American university students. The hypothesis was that if the respondents foresee both a successful career and a happy and full family life in the next 20 years without significant conflict between these two goals then they have an unrealistically optimistic view of life (a ‘Cinderella Complex’). To test whether this was true, and of which respondents, it is first necessary to prepare a set of higher order codes, or metacodes. For example the codes I will make high salary and fabulous non-trad job combine to become the metacode HIGH WORK COMMITMENT, similarly gets married/stays married and wants kids are combined to become HIGH FAMILY COMMITMENT etc. In the example provided in the manual there are five metacodes, culminating in the ‘target’ hypothesised if X then Y relationship, the Cinderella Complex, which is signalled by the metacode GOAL REACHED. These are set up as a series of rules for HyperResearch to follow as it interrogates the data shown in figure 1:
Figure 1: An example of a HyperResearch hypothesis rule list.
Rule 1. If I will make high salary and fabulous non-trad job
Then add HIGH WORK COMMITMENT
Rule 2. If gets married/stays married and wants kids
Then add HIGH FAMILY COMMITMENT
Rule 3. If HIGH WORK COMMITMENT and HIGH FAMILY COMMITMENT
Then add HIGH POTEN FOR WRK FAM CONF
Rule 4. If HIGH POTEN FOR WRK FAM CONF and (cmb wrk fam no problems or successful happy life)
Then add CINDERELLA COMPLEX
add GOAL REACHED.
(Source: Hesse-Beber et al, 1991, 4-13)
HyperResearch then tests this rule list against all interview transcripts, adding the metacodes as necessary, and provides a report about which cases meet any or all of the rules including, most importantly, the last one in which the hypothesis is ‘proven’.
Tesch (1990) and Richards and Richards (1991) argue that a theory-building program must be able to do three things: search for co-occurring, overlapping or nested codes; search for counter-evidence to these and identify the chronological sequence of the codes. Its reporting and hypothesis-testing functions allow HyperResearch to do all of these.
My use of HyperResearch: coding and retrieving
I slowly developed my coding schema in an intuitive way, starting with around 50 and building on these. According to Gittelsohn starting with something under 50 codes is about norm (Gittelsohn, 1992, p 616). However the ideal number of codes is dependent upon the nature of the research project and the purposes for which CAQDAS is used: more for grounded theory construction (Glaser and Strauss, 1967), fewer for hypothesis-testing. Eventually my schema consisted of 157 codes organised in a tree-like structure going down four levels (for example CATS/FLEX/DIFCLTYS/ANONYMOUS: suggesting a comment about the perception that students tend to become anonymous to their lecturers in higher education’s flexible credit accumulation and transfer system). Around 40 of these codes were descriptive and most of the rest thematic. There were very few codes of the explanatory type primarily because I had already written explanatory notes on the partially transcribed interviews. Much thinking in qualitative research is done through writing notes, ‘research memos’ about three discrete types of things: the substantive issue at hand, the direction of analysis and methodological issues (Burgess, 1984). There is little provision to integrate these into HyperResearch, unlike some other programs (NUDIST, for example) and my solution to this seemed to work well - to include these in the transcript (distinguishing them from the respondent’s text by the use of capital letters) and then coding them in the normal way. I also kept separate databases of fieldnotes, research memos and so on. While HyperResearch was not used to interrogate these, the codes I developed for use with the program proved a good and consistent means of classification. Because most of them had been written before I developed the coding schema I went back to the databases and re-classified the records using the codes: this proved a useful way of reviewing, developing and evaluating both the code classification scheme and the database entries and organisation.
Having coded the interview transcripts I then used HyperResearch’s analytical facilities. For reasons explored below this was limited to the first two of its three functions. The retrieval function proved very useful for three purposes: for hypothesis-testing; for ‘grounded theory’ development and for illustrative quotation. Each can be illustrated using interviewees’ comments about the accreditation of prior experiential learning (APEL).
Butterworth (1990) argues that there are two approaches to APEL, the credit-exchange model and the developmental model. In the former the candidate for admission or advanced standing in higher education merely has to show that they have knowledge, understanding and/or abilities which match those needed for entrance to or exemption from all or part of a course of study. This is easily assessed by lecturers in an appropriate way. In the developmental model, however, learning is extracted from experience through a process of assisted reflection. A portfolio is developed for assessment which contains a description of the relevant experience, reflection upon the learning from it, the learning outcomes achieved (some of which are acquired in the reflective process) and evidence to authenticate the account. I hypothesised that only the developmental model would be applicable in higher education, regardless of discipline, the credit-exchange model being limited to lower level learning. However, retrieval of all comments on APEL from the transcripts showed this was not the case and that for parts of HE courses (for example pattern-cutting in Fashion Design and computer programming in Computing) credit-exchange was appropriate and used enthusiastically by some lecturers. This finding led me to think more critically about Butterworth’s polarisation of the two APEL models and to re-consider the distinguishing characteristics of higher education.
More than just hypothesis-testing came out of this exercise, though. The extracts thus isolated showed that there were some cases where the APEL process was being used for a variety of latent as well as (or even instead of) its manifest purposes (Merton, 1968). For example some staff were using it to ‘cool the mark out’ (Goffman, 1962) when the real reasons for candidates’ rejection for a course were not connected to their APEL submission. Others were effectively using it as a screen for personality rather than knowledge and abilities Barkatoolah, 1985). The importance of the latent functions of APEL had not been apparent to me before the data retrieval exercise because the issue was peripheral to the research questions I focused on (though obviously an important finding in itself) and so each example had been ‘lost’ in the mass of data. This is an example of the development of ‘grounded theory’: ideas derived inductively from the data, in this case with the assistance of HyperResearch, and shows how the recontextualisation (Tesch, 1990) of data can be useful in this regard. More detail on my substantive findings and conclusions can be found in Trowler (1996a).
Finally at the writing-up stage the retrieval process facilitated the easy and rapid selection of appropriate quotations from the interviews to illustrate points. Finding the pithy comment that sums up or illustrates an argument using the respondent’s voice is very important in qualitative research: it gets the point across and gives confidence in the researcher’s interpretation, thus appearing to substantiate the truth claims of his or her ‘story’. HyperResearch will quickly offer all the potential extracts for quotation and picking the ‘best’ becomes a relatively easy matter.
The second of HyperResearch’s functions, conducting Boolean searches, also proved valuable. This function enabled me to explore whether there was an association between a respondent’s educational ideology and their age, for example (there was), between their gender and the way they approached the interview situation (there was), between attitudes on specific aspects of higher education such as franchised provision in colleges and changes to the system more broadly (there was) and so on. It also allowed me to quantify these things rather than using the ethnographer’s more usual vague statement of the sort "many older respondents reported that...". If one subscribes to Hammersley’s (1990) views about the need to synthesise qualitative and quantitative approaches it is important to be able to do this.
Some reflections on the ‘approaches and impact’ debate.
There is evidence from my experience to confirm points made by convergence theorists about the detrimental impact of CAQDAS. I will deal with these first and then draw out some qualifications about them.
1. Emphasis on the interviews
My experience was that there was a tendency to focus initially on the interview transcripts during the analytical phase rather than the three other data types. Primarily this was because their digital format meant that HyperResearch offered an easy way to access, categorise and analyse them: the very reason I was using it. This meant that in thinking about the meaning of the data I thought first about the interviews and only later searched other data types for confirmation or refutation.
This procedure may have had important implications for the generation of ideas about the issues I was addressing. My suspicion is that this would have happened whether I was employing CAQDAS or not: compared to reading another study of the institution or collecting documentary evidence conducting interviews tends to be a highly memorable experience and transcribing the tapes (as I did in 30 of the 50 interviews) is long and laborious work. However this effect was certainly magnified by the use of CAQDAS hence giving the whole picture a less grounded character.
2. The research sequence
Related to this emphasis on the text was the issue of the research sequence. Many commentators on qualitative data analysis (for example Miles and Huberman, 1994) stress that analysis should not be thought of as the stage after data collection, rather that the process should be an iterative one with on-going data analysis informing the data collection. Doing this makes it possible to identify potentially valuable data sources that would otherwise go unrecognised, to limit errors and omissions to the early stages of data collection, to inform interview practices and hence improve the quality of subsequent data collection and so on. The logic of using code-and-retrieve software, however, is a sequential one: collect data > transcribe > code > analyse. Moving outside this sequence presents problems. For example if coding is done over a period of time then the meaning of the codes begins to change as theory-building and concept formation develops. Though this also happens if the analysis is done only after data collection its occurrence over an extended time period makes it more difficult to monitor and correct. A text unit assigned a code early on may flag a different issue than a sequence assigned the same code later. Weaver and Atkinson rightly say that
"it requires quite a skill to be constantly concerned about the implications of newly developed categories for previously coded data, and minimising the possibility of forgetting to make the necessary modifications." (Weaver and Atkinson, 1994, p 51).
Conducting the research process in a non-linear, iterative way makes that skill an even more demanding one.
I did not formally begin to analyse the data until the final year of the five year project and I had a rather limited idea of what my conclusions might be until about half-way through that year. The fact that I needed all the interview transcripts before I could use HyperResearch to best advantage was only one reason for this, but it was an important one. Again its effect was to accentuate a phenomenon that would have occurred anyway.
However it was not only formal data analysis that came later in the research sequence, so did epistemological and ontological thinking. In the early stages researchers are too concerned with the practical issues of data collection, literature searches etc to attend to questions about the ontological status of their data, for example. Indeed, the tendency is to avoid these questions as they tend to raise doubts about the value of the data so arduously collected. Ideally such questions, like data analysis, need to come early in the research sequence, and for the same sorts of reasons. Leaving these to the end may mean that most researchers are indeed technological (and epistemological) somnambulists, waking to face the hard questions only when it is (possibly) too late. The intention to use a program such as HyperResearch only diminishes the motivation to address these questions early because they can easily (and wrongly) be classified as ‘analytical’ and safely left until that phase of the research.
3. The decontextualisation problem
According to Renata Tesch in the code-and-retrieve process researchers identify in their data documents smaller parts or "analysis units". By this she means "a segment of text that is comprehensible by itself and contains one idea, episode, or piece of information." Having identified these the task is to "de-contextualise" them, to separate relevant portions of the data, appropriately coded, from their context. The researcher can then "read in a continuous fashion about everyone’s attitude toward the electronic network, for instance". She refers to this assembling process as "re-contextualisation", the point of which is to make the analytical task more straightforward (Tesch,1990, pp 115-118).
There are a number of problems with this. First is the issue of decontextualisation. Though this can have advantages, as I showed in the APEL example above, it also abstracts the ‘meaning unit’ from its context and thus risks the analyst misinterpreting it or missing the significance of the discursive context in which it was produced: an absolutely vital aspect of textual production for a discourse analyst, for example. The production of discourse is complex: people perform a whole range of different acts in their talk and this gives rise to variations in the accounts they give. The resultant text rarely contains only "one [unambiguous] idea": interview text like other forms is polysemic, containing many potential readings. In short, the ‘encoding’ and ‘decoding’ of text by the interviewee and interviewer respectively (for example) is a sophisticated business and decoding needs to be done with the text viewed in context. Particularly in the writing-up phase I found that awareness of discursive context, of the potential ambiguity of meaning and of the occlusion of discursive intention became dimmed as I searched in a rather instrumental way for the concise but well-turned phrase from a respondent which illustrated a point I wanted to make.
4. Mixed paradigms and the ‘linguistic turn’
This discussion leads me to the problem of confusing the paradigm within which one is operating. Positivist and interpretive approaches to research are founded on very different ontological and epistemological bases. Positivists tend not to be particularly concerned about the social construction of reality in micro-universes and therefore are relaxed about the comparability of different contexts if they are ‘objectively’ similar. Moreover they tend to take text at face value, rather than seeking to deconstruct it. By contrast interpretivists, sometimes influenced by postmodernist thinking, are usually very concerned about these issues. For Zygmund Bauman there are "many ‘life-worlds’, many ‘traditions’ and many ‘language games’" (Bauman, 1988, p 229) and the consequence is that one needs to take a highly interpretative approach towards the data. As Coffey et al suggest, there is a danger of CAQDAS leading the researcher towards the positivist paradigm so that it begins to assume the characteristics of market research. Weaver and Atkinson point to this danger too:
"Coding strategies involve important assumptions and decisions which fundamentally direct data analysis. A failure to recognise the conditions and implications of these, in all their complexity, is a methodological weakness of much qualitative research." (Weaver and Atkinson, 1994, p 31).
I was intuitively aware of this danger and this was the reason I avoided HyperResearch’s hypothesis-testing function which I viewed as a crass implementation of the hypothetico-deductive approach often thought to underpin positivism. However, with Hammersley’s arguments in mind about the need to clearly delimit research findings and the claims made for them, I did use simple forms of quantification to attempt to ‘substantiate’ the story. For example:
"Only 6 of the interviewees were explicitly against the idea [of franchised HE provision], either for their own discipline in particular or in general terms, while 24 were clearly in favour, with 3 of these seeing no particular problems with it." (Trowler, 1996b, p 110).
However this is to reduce complex arguments and polysemic interview text to an extremely low common denominator and on reflection I regret making this kind of statement. I was tempted into doing so because HyperResearch made it so easy.
However, despite this evidence for some technological conditioning of my ‘methods of thought’ and general research approach, the effects of this were limited for four main reasons, outlined below.
Focusing on the text
First, while the code-and-retrieve approach does contain the danger of epistemological over-simplification, there is a balancing effect which results from the need to give very detailed attention to all of the data being coded. This close reading is, additionally, necessarily conducted in a highly reflective way: which code applies here? is there a requirement for a new code? which of these possible codes is the more appropriate, or is multi-coding necessary in this instance? what is the respondent actually saying here and is there a sub-text that needs coding?. In a way this necessary but demanding procedure brings the coding process nearer to discourse analysis than content analysis. Although I had not read their text, I was intuitively aware of the dangers of ‘sloppy’ or inconsistent coding that Weaver and Atkinson highlight:
"If data are not systematically coded in terms of all variables, then the construction of propositions will be grounded only in parts of the data, and the testing of propositions (through searching for combinations of codes) will be inaccurate." (Weaver and Atkinson, 1994, p 43).
Awareness of this, and the fact that careless coding quickly undermines and makes valueless all the work that has gone on before (not to mention the expense of interview tape transcription) means that there is a constant incentive to code very carefully and to code prolifically, to ‘over-code’ on the grounds that this can be retrieved during a later phase of analysis whereas sloppy coding cannot. In short code-and-retrieve software pushes the analyst towards rigour in this regard.
Let me give an example from interview 03. The following textual fragment was coded academics/presage:
"[as to my] views of the role of Higher Education, I could isolate three things, I could isolate my own personal background, I could isolate time spent in the United States as an influence, and then thirdly my own professional background within this institution."
This respondent then went on to articulate in detail the nature and impact of each of these three influences, showing how they led him to a concern for access issues in higher education, to favour expansion and to be critical of academic elitism. What was being offered was a highly structured coherent ‘story’ about the influence of background on current thinking, rooted in a social democratic concern for equality of opportunity and education viewed as a instrument for meritocratic social mobility. However, littered throughout the interview text, and highlighted by the detailed coding process, were discursive traces of other influences in phrases such as: [courses as] "product"; "market forces"; [student as] "customer"; [university as] "brokerage firm" and "consumerist choice". These I coded ideology/new right. The apparently paradoxical relationship between these two strands interwoven in the interview, highlighted by the coding process, led to careful thinking about the nature of discursive production and to the development of further codes to apply to similar cases. Apparent in the data as a whole, such discursive inconsistencies actually led to a greater rather than lesser awareness on my part of the complex ontological status of these kinds of data.
My experience was also that careful coding made explicit my changing understanding of the data as I interacted with them. The struggle to develop an appropriate conceptual and theoretical explanatory framework for the data intensified during the final year of the project and the development of the coding schema became an articulation of this: some codes were renamed, split, merged with others or deleted as the framework developed. The need to ensure that the schema adequately reflected the theoretical framework meant that the development of the latter became a far more explicit process for me than it would otherwise have been. This had the important advantage of making me aware of the twin dangers of cognitive closure and the holistic fallacy: beginning to see all the data in terms of (and as confirming) a developing theoretical account and of perceiving too many patterns and too much coherence in them. To counter these dangers we are urged in the data analysis literature (eg Strauss and Corbin, 1990, p 253) to search for evidence which may refute any developing explanatory framework. Fear of falling into these traps led me to adopt the techniques of extravagant multi-code reporting and of frequently returning to the full transcripts during the final analytical phase so that disconfirming as well as confirming evidence would be apparent.
The flexibility of HyperResearch
Linked to this is the third point; that HyperResearch is much more versatile than the literature on CAQDAS would suggest, demonstrating that diversity of approach is possible within the one program.
The literature on the coding process using a code-and-retrieve approach suggests that it can take one of two forms: inductive (Glaser and Strauss, 1967; Becker, Gordon and LeBailly, 1984) or deductive (Miles and Huberman, 1994) coding. Inductive coding (or ‘thematic induction’: Tesch, 1990) is rooted in a grounded theory approach and applies when codes are developed from the text rather than in advance. In deductive coding the codes are pre-defined from prior theoretical work and their function is to make retrieval easier rather than, primarily, to assist in theory-building, as is the case with inductive coding.
However in practice the two approaches are not mutually exclusive: HyperResearch is flexible enough to allow both. I developed several codes in advance, expecting to use HyperResearch deductively. For example the codes Academics/proletarianisation/pro (and con) were prepared in advance and arose out of my reading about the effects of credit accumulation and transfer schemes (CATS). I applied the codes wherever there was evidence pertaining to the theory of academic proletarianisation on either side of the debate. However the ideas which resulted from the data made it virtually impossible not to develop and later refine a considerable number of other codes which naturally arose from them, as I argued above. Likewise the researcher is drawn to make judgements about the significance of the data when codes determined in advance prove inappropriate or insufficient to deal with them.
There is another sense in which the two approaches are not as distinct as they would appear from the literature on data analysis. The inductive approach, while in theory taking the concepts from the data, in fact incorporates prior theory: observation is never independent of theory, as Merton reminds us:
"Despite the etymology of the term, data are never ‘given’ but are ‘contrived’ with the inevitable help of concepts." (Merton, 1949, p 370).
What we come to see depends upon what we are looking for, the prior intellectual frameworks we have acquired and the discursive repertoires available to us. Observation is inevitably permeated with theory. On the other hand conceptual and theoretical approaches are influenced by observation so that, for example, my understanding of the nature and types of educational ideologies was shifted as a result of the data I collected from the fieldwork. Theory and observation, therefore, are inextricably intertwined.
If the differences in character of the ‘two’ approaches has been exaggerated, so have their purposes. Christensen (1992, p 6, quoted in Weaver and Atkinson, 1994, p 24) is wrong to say of deductive coding that it has the disadvantage of not being able to "facilitate further understanding of the material. The analysis may be reduced to some kind of mechanical application." This is illustrated by my coding for views on CATS and the accreditation of prior experiential learning (APEL). ‘Deductive’ coding, based on my reading primarily of Butterworth’s work led to considerably greater understanding of the material and ultimately a sustained critique of Butterworth’s approach (see Trowler, 1996a). Conversely inductive coding was very helpful simply in organising the data, particularly as providing classificatory devices for field-notes of observations and reviews of and comments on secondary data.
HyperResearch and the whole research project
Fourthly, however, is the issue of the overall role of HyperResearch within the research project considered as a whole. Though HyperResearch proved valuable in illuminating and illustrating the ‘nooks and crannies’ of the findings of this research project, and probably extending it further than it would have otherwise gone, it was not at all instrumental in the larger-scale theoretical development associated with the research. The most important findings were both methodological and theoretical: how to understand the nature of cultures and their influence on policy implementation in university contexts; the relevance of this for ethnographic methodology; the limitations of the ways in which academics’ responses to change have traditionally been viewed; the gendered nature of these issues and so on. These findings were derived from the data and the literature in the ways traditionally associated with the qualitative research: by reading, mulling things over, going back to the data, talking with others, attending relevant conferences and so on.
When I began this research I was far from being the discriminating chooser depicted by Weaver and Atkinson (1994, pp 158 and 165) who uses explicit criteria to select the appropriate software according to its relative strengths, weaknesses and approaches to analysis, or to decide not to use CAQDAS at all. Neither was I was aware of the various ‘sub strategies’ possible within the one I finally selected. Moreover I was ignorant of the debate outlined above concerning the potential unintended consequences of the use of CAQDAS. I was a technological somnambulist, in short.
Despite that, however, the conclusion I draw from this reflective exercise on my experience is that though there was some degree of technological conditioning operating this had a benign as well as malign influence. That experience suggests, moreover, that its influence is not powerful enough to warrant the concerns expressed by those subscribing to the convergence model of CAQDAS impact even when one discounts its benign effects.
This does not mean, however, that one should be complacent about the use of this kind of software. I would concur with Weaver and Atkinson’s comment that "[t]here is a continuing need for critical reflection by analysts of their own implementation of microcomputer strategies" (Weaver and Atkinson, 1994, p 2). There are more choices available in both the selection of the software type and its use, particularly in the coding process, than is obvious to the novice and it is obviously better to be aware of ‘tendencies’ within programs than ignorant of them. All analysis techniques re-organise and ‘change’ the data in some way. The choice of strategy will inevitably affect the way the data are changed, whether computer use is involved or not. At least with a computer-based strategy the techniques used are explicit and to some extent available to others: this improves the extent to which the ‘symptoms of truth’ (McCracken, 1988) are apparent in a particular piece of research. Glaser and Strauss suggest that...
"[t]he researcher ought to provide sufficiently clear statements of theory and description so that readers can carefully assess the credibility of the theoretical framework he [sic] offers." (Glaser and Strauss, 1967, p 232).
The use of HyperResearch has the important advantage of making such statements more explicit not only for readers but for the researchers themselves during the theoretical development phase. It has an intrinsic tendency to encourage the analyst to move towards epistemological awareness, and even sophistication. However, there was nothing about its substantive contribution to this research project that could not have been derived from non-computer-based analytical procedures, though of course it is far more efficient than most of these other approaches could be. In that sense, then, there is no evidence from this example of the implementation of code-and-retrieve software to support the repertoire-enhancement theorists’ case either. This is not necessarily true of other qualitative data analysis software types, though, and at least some of the supporters of both models agree that the hypertext and hypermedia approach has intriguing analytic possibilities (Coffey et al, 1996; Lee and Fielding, 1996). As far as code-and-retrieve approach is concerned, however, my experience suggests that both models exaggerate the significance its impact.
I commend HyperResearch to those who consider code-and-retrieve an appropriate analytical approach in their research, but with a health warning appended which reads: "like any other analytical tool, this has its own logic: approach it reflectively".
Aljunid, S. (1996). How to do (or not to do)...: Computer Analysis of Qualitative Data: the use of ethnograph. Health and Policy Planning, 11, 1, 107-111.
Barkatoolah, A. (1985). Some Critical Issues Related to Assessment and Accreditation of Adults’ Prior Experiential Learning. In S. Weil, and S. McGill (Eds.) Making Sense of Experiential Learning. Buckingham: Open University Press/SRHE.
Bauman, Z. (1988). Is There a Postmodern Sociology?, Theory, Culture and Society, 5, 217-237.
Becker, H. S., Gordon, A. C. and LeBailly, R. K. (1984). Fieldwork with the Computer: criteria for assessing systems. Qualitative Sociology, 7, 1-2, 16-33.
Burgess, R. (1984). In The Field. London: George Allen and Unwin.
Butterworth, C. (1992). More Than One Bite at the APEL. Journal of Further and Higher Education, 16, 3, Autumn, 39-51.
Christensen, B. M. (1992). Conceptual Understanding By Use of Computers. Paper presented to the International Human Science Research Association Conference.
Coffey, A. Holbrook, B. and Atkinson, P. (1996). Qualitative Data Analysis: technologies and representations. Sociological Research Online, 1, 1. http://www.socresonline.org.uk/socresonline/.
Fielding, N. G. and Lee, R. M. (1995). Confronting CAQDAS: Choice and Contingency. In R. G. Burgess (Ed.) Studies in Qualitative Methodology vol. 5: Computing and Qualitative Research.. Greenwich CT: JAI Press.
Fielding, N. G. (1991). Qualitative Data Analysis Packages in Teaching. Psychology Software News, 2, 3, 74-78.
Fielding, N. G. and Lee, R. M. (eds) (1991). Using Computers in Qualitative Research. London: Sage.
Gittelsohn, J. (1992). An Approach to the Management and Coding of Qualitative Data Using Microcomputers. Indian Journal of Social Work. 53, 4, 611-200.
Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory. Chicago: Aldine.
Goffman, E. (1962). On Cooling the Mark Out: some aspects of adaptation to failure. in A. M. Rose (Ed.) Human Behaviour and Social Processes: an international approach. London: RKP, 482-505.
Hammersley, M. (1990). Schoolroom Ethnography. Buckingham: Open University Press.
Hesse-Biber, S., Dupius, P. and Kinder, S. (1990). HyperResearch: a computer program for the analysis of qualitative data using the Macintosh. International Journal of Qualitative Studies in Education, 3, 2, 189-193.
Hesse-Biber, S., Kinder, S., Dupius, P. R., Dupuis, A. and Tornabene, E. (1991). HyperResearch from Researchware: a content analysis tool for the qualitative researcher. Randolph, MA: Researchware.
Lee, R. M. and Fielding, N. G. (1996). Qualitative Data Analysis: representations of a technology: a comment on Coffey, Holbrook and Atkinson. Sociological Research Online, 1, 4. http://www.socresonline.org.uk/socresonline/.
McCracken, G. (1988). The Long Interview. Beverley Hills: Sage.
Merton, R. (1949). Social Theory and Social Structure. New York: The Free Press.
Merton, R. (1968). Social Structure and Anomie. in R. Merton (Ed.) Social Theory and Social Structure. New York: New York Free Press.
Miles, M. B. and Huberman, M. A. (1994, second edition.) Qualitative Data Analysis. Beverley Hills: Sage.
Pfaffenberger, B. (1988). Microcomputer Applications in Qualitative Research. Beverley Hills: Sage.
Potter, J. and Wetherell, M. (1987). Discourse and Social Psychology. London: Sage.
Richards, L. and Richards, T. (1991). The Transformation of Qualitative Method: Computational Paradigms and Research Processes. In N. G. Fielding and R. M. Lee (Eds.) op cit.
Seidel, J. (1991). Method and Madness in the Application of Computer Technology to Qualitative Data Analysis. In N. G. Fielding and R. M. Lee, (Eds.) op cit.
Strauss, A. and Corbin, G. (1990). Basics of Qualitative Research: grounded theory procedures and techniques. Newbury Park, Ca: Sage.
Tesch, R. (1990). Qualitative Research. Lewes: Falmer.
Trowler, P. (1996a). Angels in Marble? Accrediting Prior Experiential Learning in Higher Education. Studies in Higher Education, 21, 1, 17-30.
Trowler, P. (1996b). Academic Responses to Policy Change in a Single Institution: A Case Study of Attitudes and Behaviour Related to the Implementation of Curriculum Policy in an Expanded Higher Education Context During a Period of Resource Constraint. Unpublished PhD Thesis, Lancaster University.
Walker, B. L. (1993). Computer Analysis of Qualitative Data: a comparison of three packages. Qualitative Health Research, 3, 1, 91-111.
Weaver, A. and Atkinson, P. (1994). Microcomputing and Qualitative Data Analysis: Cardiff Papers in Qualitative Research. Aldershot: Avebury.