The Effect of WWW Document Structure on Students' Information Retrieval

Abstract: This experiment investigated the effect the structure of a WWW document has on the amount of information retained by a reader. Three structures common on the Internet were tested: one long page; a table of contents leading to individual sections; and short sections of text on separate pages with revision questions. Participants read information structured in one of these ways and were then tested on recall of that information. A further experiment investigated the effect that 'browsing' moving between pages has on retrieval. There was no difference between the structures for overall amount of information retained. The single page version was best for recall of facts, while the short sections of text with revision questions led to the most accurate inferences from the material. Browsing on its own had no significant impact on information retrieval. Revision questions rather than structure per se were therefore the key factor. Reviewers: John Errington (U. Northumbria at Newcastle, UK), Xiufeng Liu (St. Francis Xavier U., CA), Sandra Wills (U. Wollongong, AU) 
 Interactive elements: The three Web document designs contrasted in the experiments are provided. 
 Interactive demonstrations: The websites contrasted in the experiments are explained in Sections 1.1.1 - 1.1.3 (Expt. 1), and 3.2 (Expt. 2), with links to the respective websites


Introduction 1.1 Rationale
Hypertext is a computing system first proposed by Vanuvar Bush (1945).It consists of a series of on-line texts linked by 'hotspots' -important words or phrases which are highlighted in the text.By clicking on these hotspots, users can 'jump' between sections of the text.
The World-Wide Web (WWW) has greatly popularised hypertext.It was designed as an information dissemination tool, but educational materials are migrating to the WWW at great speed.Good design for one purpose, however, may not be good for the other.
One of hypertext's most important design features is how information is structured.Does this affect how much students learn from hypertexts?At one extreme, the document may be little different from a book -one long section of text.At the other, individual paragraphs, with links between them, may take up a 'page' each.Sections may also include questions to encourage readers to review the material as it is read.
This experiment investigates whether this structure has an effect on the amount of information a reader can recall a short time later.Three structures common on the Internet were chosen as a representative range, as follows.

Unified text version
Paragraphs are the natural unit of text.Hypertext systems use them to create natural section breaks.Nash, however, cautioned that it is "a mistake to think of them as self-contained units, the 'building blocks ' of text." (1980, p8).Arguments flow between paragraphs; splitting them up breaks the links which serve to reinforce the author's message.Peter Whalley (1993) argues strongly that serving up linear, cohesive texts as small chunks does not encourage deep learning.He cites Nash on the expository approach to writing: There is a programme of assertions, examples, qualifications, but they are not presented as a series of distinctly labelled positions.Instead they are related to each other in a progressive unfolding pattern, the turns and connections of which are demonstrated in various ways.(p.11)By splitting up a text into short, independent sections, a hypertext author loses this coherence and reinforcement.While hypertext is ideal for presenting small pieces of unrelated information, in an encyclopaedic form, educational materials are rarely structured in this manner.Whalley predicts that "the fragmentation effect in hypertext… is likely to make it more difficult for the learner to perceive the author's intended argument structure" (p.11).

The Effect of WWW Document Structure on Students' Information Retrieval
Brown elaboration as one explanation of this.By processing material semantically, it is stored in many different, elaborated ways -improving subsequent recall.Walker, Jones and Marr (1983) suggested an additional mechanism: that increased cognitive effort (defined as the proportion of available cognitive resources consumed by a task) would increase recall, by creating a more distinctive memory trace.
This document structure encourages this elaboration and extra processing.After reading a short section of material, students are asked a question.This forces them to review and re-interpret the previous material -leading to more processing and more elaborate storage.If the question is answered correctly, the next section of material is shown.If it is not, an explanation of the correct answer is given before participants move on to the next page.
The test version is linked to the web version of this document4 .

1.2
Reproductive and reconstructive memory Wickelgren (1972) makes an important distinction in learning.Reproductive memory stores simple facts.Reconstructive memory requires inferences to be made from learned material.While reproductive learning is factual, reconstructive learning is more semantic in nature.Samarapugavan and Beishuizen (1990), using a hypertext containing several different perspectives on the same material, found an improvement only in participants' reconstructive learning.

These experiments
Experiment 1, entitled Comparing three common document structures compares the three hypertext versions above.Experiment 2, entitled The browsing effect compares the unified and hierarchical versions to uncover any effect caused by browsing, where users have to flick backwards and forwards between small sub-sections of text presented individually.This is expected to disrupt concentration and Nash's (1980) expository flow, and therefore learning.
Reproductive and reconstructive questions were included in the tests participants took after studying one of the hypertexts, to uncover effects on both types of learning.
The hypothesis was that document structure would affect learning, in both overall scores and the separated reproductive and reconstructive scores.The active review hypertext was expected to be most successful, due to the increased elaboration and processing it causes.The hierarchical version was expected to be least successful, due to a detrimental browsing effect not counterbalanced by any semantic factors.

Method
The 42 participants were new Computing Science students.While some had studied the subject, the majority had minimal computing experience.They were randomly selected during three 'computer introduction' sessions in a computing laboratory.
Participants were randomly assigned to one of three groups corresponding to the unified, hierarchical or active review hypertexts.The documents' contents concerned Joan Miro, a relatively obscure Spanish artist unknown to the students (this was checked with each participant).The hypertexts differed in structure and the presence or absence of revision questions.The pages were designed to take about two minutes to read, allowing students to review them in the allocated five minutes.
Netscape Navigator (a WWW browser) was set up on Macintosh LC475 computers at the relevant starting page.
Each participant was instructed to read the page(s) for five minutes with the intention of taking a paper-based test afterwards (in Appendix 1). 5 Students had already been taught how to use the browser.
After studying their assigned WWW document for five minutes, participants took a test containing seven reconstructive and thirteen reproductive randomly-ordered questions on the hypertext.Five minutes were allowed for the test.
Participants started the experiment at different times and sat at a distance from each other.The question sheet reminded subjects that the test results were anonymous, and asked for their cooperation in completing the questions without help.This was given in all cases.As a follow-up, a poster was displayed in the first-year computing area showing results and a WWW address for further information and queries. 5 Numerous studies have shown intention to learn has little effect on learning (Hammond, 1993, p57).Students using Computer-Aided Learning (CAL) tools will (hopefully!)always intend to learn.

Results
Analysis of participants' total test scores 6 gave the following results.
Table 1: Total test scores on the three document structures A one-way ANOVA test showed there was no significant difference between groups (F(2,41)=.16,p=0.856).
Scores were then separated into reproductive and reconstructive question totals.

Reproductive questions
Figure 1 and Table 2 show that the unified and hierarchical groups scored more (0.9857 and 1.002 standard deviations respectively) on the test than the active review group.Scores were corrected for guessing on questions 5, 6, 7, 12, 13 and 16 (where the answer was yes/no) by deducting one point for an incorrect answer.

Brown
Journal of Interactive Media in Education, 98 (12) Page 7 Table 2: Mean scores and standard deviations on the reproductive questions for each web structure A one-way ANOVA test showed there was a significant difference between groups (F(2,41)=4.24,p=0.022).A post hoc Scheffé test showed there was a significant difference in test score between the unified and active review groups (F(2,41)=3.25,p < .05)and the hierarchical and active review groups (F(2,41)=3.26,p < .05)but not between the unified and hierarchical groups (F(2,41)=-4.05,p > .05).

Reconstructive questions
Figure 2 and Table 3 show that the active review group scored more on the test than the unified and hierarchical groups (0.8574 and 0.9889 standard deviations respectively).3: Mean scores and standard deviations on the reconstructive questions for each web structure A one-way ANOVA test showed there was a significant difference between groups (F(2,41)=3.73,p=0.033).The active review group scored significantly more than the hierarchical group (t(23)=2.75,p=0.0057, one-tailed) and the unified group (t(26)=-2.33,p=0.01415).There was no significant difference, however, between the unified and hierarchical versions (t(27)=-.36,p=0.36, one-tailed).

Experiment 2: The browsing effect
In order to separate the effects of revision questions from page size, this second experiment directly compares a unified and a hierarchical hypertext.

Method 7
28 students from a first-year 'Statistics for Psychology' course took part in this experiment as one of their class assignments.They had little previous statistical training, having no post-16 mathematics qualifications, apart from this course.All had received instruction and practical experience in using a World Wide Web browser.
The participants were randomly assigned to one of two groups corresponding to a hierarchical or unified hypertext.The hierarchical hypertext8 consisted of a contents page leading to a set of 13 pages.The unified version hypertext9 contained the same information in one page.This information was roughly twice as long as that given in Expt.halfway through the statistics course.While participants may have differed in mathematical ability or previous learning, their random distribution between groups should prevent this influencing the results.
Netscape was set up at the correct starting page on a cluster of 486 Windows-based PCs.Each student was instructed to study their hypertext for 10 minutes, ready to take a 5-minute paperbased test afterwards.The test contained 10 reconstructive and 6 reproductive randomlyordered questions.A handout was prepared for the following week's class explaining the results and thanking the students for their participation.A pointer to an e-mail address and WWW page was given for any further queries.

Results
Participants' scores 10 , in total and separated into reproductive and reconstructive components, are shown below.

Discussion
WWW document structure does seem to have an effect on information retrieval.The different structures in Comparing three common document structures led to a difference of a full standard deviation between the highest and lowest recall documents.Because reconstructive and reproductive learning were affected differently, there was no significant difference between total recall scores for the three document types.
The low ranking of the hierarchical hypertext for reconstructive learning seems to confirm the worries of Jonassen (1993) and Dillon et al. (1993) about the concept of semantic structures.
As Hammond (1993) states, "there are many situations where learning is most effective when the freedom of the learner is restricted to a relevant and helpful subset of activities."(p52) Dillon et al. make the important point that paper texts are also structured.Might they not be just as successful in conveying semantic information?Few documents consist of page after page of uninterrupted text.Section headings and bullet points are just two of the devices authors use to give structural information.The restructuring of a document that often occurs when it is moved on-line may be more important than any effect the medium itself has.
In addition, 'standard' structures such as the newspaper article -or an APA style report -are more familiar to readers than new structures created by hypertext designers.Kintsch and Yarborough (1982) showed that standard text structures prompt better understanding than unconforming texts.Hypertext design tools, and the whole 'flash-bang' culture of the Internet, promote novelty as a positive attribute for WWW pages.For learning materials, this may be a mistake.
Even if hypertext structures do impart useful semantic information, "such high-level abstractions are always going to be in danger of 'spoon-feeding' students with structures they should be developing for themselves."(Whalley, 1993, p14).At university level, the development of such 'thinking skills' is perhaps even more important than the acquisition of knowledge (Bligh, 1977).Using revision questions in the text appears to be a better way of stimulating semantic learning.It is important, however, that they encourage rather than replace original thinking by the student.Conversely, it appears that splitting pages into small pieces has no detrimental effects on later information retrieval.Experiment 2 found that readers jumping between small sections of text recalled as much as those reading a long page of material.Nash's (1980) expository flow therefore seems unimportant in this context.It is a moot point whether clicking on a link, scrolling down a screen or turning the page of a book breaks concentration more.At least for short-term recall, the medium seems unimportant to the message.
The varying performance of the active review hypertext in Comparing three common document structures is initially puzzling.Samarapungavan and Beishuizen (1990) also found that encouraging students to take different perspectives on hypertexts improved what they called conceptual learning, but not factual recall.
Work by Neff Walker (1986) explains this difference.He showed that elaboration of memory traces actually decreases recall of factual information, by reducing the strength of pathways between the original proposition and other stored data.This reduces the likelihood of activation and thus recall of the original fact, which cannot be inferred from the elaborated information stored with it (unlike reconstructed facts).The elaboration caused by the revision questions in the active review hypertext was most likely responsible for participants' poor performance on reproductive recall.This is further evidence against Anderson and Reder's (1979) contention that elaboration increases the likelihood of activation of propositions adjacent to the original memory trace, and thus the probability of its recall.Walker, Jones and Mar (1983) suggest that the increased amount of cognitive effort caused by the revision questions would improve the recall of that information by increasing the 'distinctiveness' of the memory trace.This approach has been criticised by Mitchell and Hunt (1989), who found in the literature a 'haphazard correlation between indexes of cognitive effort and of memory performance'.Semantic processing sometimes requires more cognitive effort, but not always (Eysenck and Eysenck, 1979).Comparing three common document structures appears to back up their conclusion that cognitive effort serves only as a boundary condition in memory performance -any extra processing caused by the revision questions resulted in diametrically opposed changes in recall performance for reconstructive and reproductive learning.The explanation that cognitive effort acts in the same manner as elaboration, in opposite directions for the two types of learning, would only be viable if creating elaborated semantic memories was a less costly process than storing facts.This would fly in the face of 20 years of cognitive science research.
Participants' cognitive resources were anyway unlikely to have been stretched to breaking point by the simple task of learning written material -something at which undergraduates will have had plenty of practice.Mitchell and Hunt propose that only when other unrelated cognitive The Effect of WWW Document Structure on Students' Information Retrieval Brown processes 'crowd out' memory formation processes will high cognitive effort cause memory systems to fail.Changes in cognitive effort may be sufficient to produce changes in recall performance, but are not a pre-requisite.
It may also be that, in the five minutes available to participants in the first experiment, a tradeoff had to be made between learning facts -where organisation is perhaps more important -and creating elaborated, inferential memories.The cueing of reconstructive learning caused by the revision questions in the active review hypertext may have distracted participants from reproductive learning.Memory is actually a good index of task interference (Mitchell and Hunt, 1989).As Walker (1986) says, "processes that rely upon direct retrieval do not play an important role in explaining the generally beneficial effect that elaborative processing has upon recall" (p.325).A Mitchell and Hunt-style boundary condition may have caused the processes initiated by the revision questions to prevent other memory processes completing successfully.This is the most common reason why memory fails.
It appears therefore that the revision questions were the key to the first experiments' results.The second experiment found no evidence of any browsing effect.There was a full standard deviation's difference between the active review hypertext and its two competitors in Comparing three common document structures, positively for reconstructive learning and negatively for reproductive learning.It is ironic, although reassuring, that 'traditional' memory theory has proven more important in explaining this experiment's results than any new computing-specific theories.In return, this experiment has provided extra evidence against the hypotheses of Anderson and Reder (1979), and for those of Walker (1986).
The revision questions within the active review hypertext required a mix of reproductive and reconstructive answers.The test materials were not long enough to extract meaningful data on which type of question was more successful in encouraging either type of learning.This may be an important point for further work, with particular practical application for authors of learning materials (Whalley, personal communication).Some previous work has indicated that semantic tasks produce better recall that non-semantic (Tyler, Hertel, McCallum and Ellis, 1979;Krinsky & Nelson, 1981).Mitchell and Hunt (1989) caution, however, that extra tasks must stimulate new memory processes above those caused by the original task to improve recall.Revision questions, especially those which encourage readers to consider the information given in new ways, would seem to fulfil this requirement.More complex question types, particularly where readers must assess their confidence in their answers, could encourage this process further.The active review hypertext has the advantage that students are forced to answer the questions before proceeding, unlike paper texts!Like the unified hypertext, it also ensures readers are lead in the correct order through the Journal of Interactive Media in Education,98 (12) material.The hierarchical version allows users to choose their own route through the text.This may be inappropriate for learning materials, where an expert author attempts to convey more than simple snippets of knowledge.In larger hypertexts it may also cause navigation problems, with users missing material or becoming 'lost in hyperspace' (Edwards and Hardman, 1989).WWW servers maintain accurate logs of page accesses and times, which would allow a precise picture of users' paths through a hypertext to be built up.This would overcome the problem with navigation research identified by Dillon (1992), of measuring the reading process as opposed to the reading outcome.
Perhaps the two texts were not long enough for the unified structure's other advantages to show through.It is difficult to develop a complex rhetorical argument in less than a thousand words.
The test questions measuring reconstructive learning required participants to integrate material usually from only one paragraph.It would be interesting to investigate how a broken-up hypertext compares with a unified text on questions which evaluate longer sections of text, over such section breaks.
Hypertext should not be completely written off at this point.This experiment considered structure, which is only one small variable.The materials used were simple combinations of text and images.The real potential of computer-aided learning lies in richer, more interactive materials.Recent work along these lines (e.g.Large, Beheshti, Breuleux and Renaud, 1994) has found that learning is often improved by taking advantage of techniques such as animation which are not possible in print.Dynamic interaction between learner and computer is even more important, particularly in the opportunity for individualised attention that systems can give each student.Communication between students while reading hypertexts takes this a step further, opening up many possibilities for peer-group learning.Asking the students themselves to write the materials as a collaborative effort provokes far more active learning than simply spoon-feeding them reams of materials (Downing and Brown, 1997).
Further work with class-based, longer and richer hypertexts would therefore be the best extension to this experiment.It would allow proper evaluation of the unified hypertext, taking into account factors such as the reinforcement of concepts and themes which Whalley (1993) considers important.Participants would use more realistic strategies for learning the material than those available in five or ten minutes.Other computer media could be included in the comparison.Most importantly, it would best approximate real-life learning situations.Ultimately, improving educational practice must always be the goal of this type of research.

Figure 1 :
Figure 1: Graph of scores on the reproductive questions for each web structure

Figure 2 :
Figure 2: Graph of scores on the reconstructive questions for each web structure

Figure 3 :
Figure 3: Graph of scores in Experiment 2 1: Comparing three common document structures.The hypertexts gave information on non-parametric tests, a new topic

Table 4 :
Mean scores and standard deviations in Experiment 2The Effect of WWW Document Structure on Students' Information Retrieval The Effect of WWW Document Structure on Students' Information Retrieval BrownJournal of Interactive Media inEducation, 98 (12)