(1) Do menus provide added value to signposts
in monolingual dictionary entries?
Bartosz Ptasznik & Robert Lew
Dictionary users often run into problems identifying the relevant part of longer dictionary entries consisting of multiple senses. To counter this particular problem, some dictionaries now include special sense-guiding devices. These generally come in one of two types:
(1) entry menus, which provide a sort of a ‘table of contents’ at the top of the entry; and
(2) signposts, which are brief pointers at the beginning of each individual sense.
Previous studies indicate that sense-guiding devices assist users in selecting the relevant sense in a polysemous entry. Although two studies (Lew 2010; Nesi and Tan 2011) have found signposts to be more effective than entry-initial menus, no study so far has assessed a combination of signposts and menus within the same entry. The present study compares entry consultation time and accuracy of sense selection in entries supplied with a combination of signposts and menus against signposts alone. Experimental data were collected from 118 intermediate Polish students of English. A linear mixed-effects model analysis with planned comparisons was carried out on the data. Results indicate that adding menus to signposts brings no benefits and that signposts alone work best. As this appears to be the first application of mixed-effects model analysis in metalexicography, another aim is to test its usefulness.
(2) The role of corpus evidence and user log files
in guiding decisions on dictionary inclusion
Robert Lew
One of the fundamental decisions that lexicographers need to take when compiling a dictionary is its coverage. How this decision is approached has been changing over time, in step with evolution in dictionary-making as a whole. The introduction of corpus evidence to lexicography, pioneered with the COBUILD project, established corpus frequency as an important criterion for dictionary inclusion. The idea is to include in the dictionary items up to a certain frequency, but exclude those below a certain frequency threshold. But quite soon lexicographers discovered that the frequency criterion cannot be applied too stringently. It is well known, for instance, that in virtually any corpus of English Wednesday is a much less frequent word than Monday or Friday. Yet nearly all lexicographers (and even more so dictionary users) would feel it would be perverse to include some days of the week, but not others.
Another factor in assessing the role of corpus evidence is the balancing of the corpus in terms of the text types, discourse, and language varieties that it represents to varying degrees. With the benefit of hindsight, we know that the original COBUILD corpus had an excessive proportion of journalistic discourse in it, and this has certainly affected the coverage of the dictionary.
As dictionaries have increasingly gone online, a possiblity has presented itself of recording details of user visits, including the search terms they enter in the dictionary. As these accrue over time, log files build up to produce a statistical reflection of what words and expressions actual dictionary users wish to look up in the given dictionary. Therefore, by analyzing the log files of their online dictionary portal, lexicographers are potentially able to identify the sets of items that are:
(1) looked up by users and covered in the dictionary;
(2) not sought by users and absent from the dictionary;
(3) not sought by users though still covered in the dictionary; and
(4) sought by users but missing from the dictionary.
Of the three cases, (1) and (2) are of course good news for lexicographers and require no intervention. Items in set (3) probably do little harm; in contrast, set (4) indicates failure. This failure may sometimes be the user’s fault but otherwise it may guide lexicographers in what items should be added to their dictionary.
In recent years, a number of studies of dictionary user log files have been attempted for various online dictionaries (Bergenholtz and Johnson 2005; De Schryver et al. 2006; Verlinde and Binon 2010; Lorentzen and Theilgaard 2012; Schoonheim et al. 2012; Koplenig et al. 2014). In my presentation, I would like to provide an overview of the findings of these recent log-file analyses insofar as they are able to inform decisions on item inclusion in the dictionary. On this basis, I plan to draw conclusions on the relative roles of corpus evidence and log-file data in guiding lexicographic item selection.