What are dictionary definitions good for?
Włodzimierz Sobkowiak
0. Assumptions
1. Phonetic Difficulty Index (PDI)
2. PDI annotation of:
(a) lexica
(b) text corpora
3. PDI annotation of MEDAL definitions
4. Applications:
(a) phonolexicographic analysis
(b) didactic use
0. Assumptions
(a)
Dictionary
definitions are actually read in (monolingual) dictionary lookup
(b)
"Inner
[...] pronunciation [...] is a constituent part of reading by far the most of
people" (Gibson and Levin 1975:342)
(c)
Even
more so in FL learners
(d)
Phonetically
difficult definitions hinder subvocal reading
(e)
EFL
Dictionary definitions should be 'user-friendly'
(f)
It is
possible to measure phonetic difficulty
Gibson,E.J. & H.Levin.
1975. The psychology of reading. Cambridge, Mass.: The MIT Press.
1. Phonetic Difficulty Index
(PDI)
The PDI algorithm was run over the Oxford Advanced Learner's
Dictionary of Current English (OALDCE) word-list, which currently counts 85430 wordforms and
25264 lemmas. Each encountered phonetic
difficulty was counted as one point.
The algorithm generated the PDI range between 0 (easy) and 10 (hard), with a mean of
2.45, and standard deviation 1.56.
Apart from measuring the overall phonetic difficulty of a lexical item,
the algorithm also assigns tags containing 57 Polglish difficulty codes.
Table 1.
Some examples of PDI codes with their lexical frequency and sources of likely
errors
PDI code |
frequency |
source of
likely Polglish error |
a –
compound |
11148 |
stress,
geminates |
g –
<ou> in word |
3992 |
many phonetic
realizations |
r –
<gh_> or <ght_> in stem |
534 |
many
phonetic realizations |
A –
linking /r/ |
4787 |
/r/ or not? (BrE), trilled? |
B – /e«/ |
1129 |
/j/ breaking, smoothing, schwa |
H – velar nasal |
10044 |
/Ng/, /Nk/, /n/ |
J – short
schwa |
32192 |
schwa
quality |
N – final
voiced obstruent |
31427 |
devoicing |
U –
post-alveolar affricates |
7631 |
Polish
apical substitutes |
1 –
British≠American |
31710 |
accent
confusion |
2 – more
than 5 syllables |
750 |
stress and
articulation problems |
3 –
secondary stress |
10351 |
reduced to
unstressed |
2a. PDI annotation of lexica
Table 2. A sample of a PDI-annotated lexicon (OALDCE word-list)
word |
stem |
British |
syllable structure |
POS |
syllable number |
PDI value |
PDI code |
boggling |
boggle |
'b0glIN |
'CVCCVC |
Ib% |
2 |
2 |
H1 |
bogy |
bogy |
'b5gI |
'CVCV |
K8$ |
2 |
0 |
|
bohemians |
bohemian |
b5'himI@nz |
CV'CVCVVCC |
Kj% |
4 |
5 |
CJNQV |
PDI code |
Polglish
difficulty |
incidence
in the OALDCE word-list |
incidence
in MEDAL definitions (records) |
H |
velar nasal |
10044
(11.8%) |
49759
(56.2%) |
1 |
British≠American |
31710
(37.1%) |
81387
(92.0%) |
C |
/I«/ |
3337 (3.9%) |
10205
(11.5%) |
J |
short schwa |
32192
(37.7%) |
83506
(94.4%) |
N |
final voiced obstruent |
31427
(36.8%) |
75014
(84.8%) |
Q |
vowel over-nasalization |
7612 (8.9%) |
24477
(27.7%) |
V |
glottal fricative /h/ |
4267
(5.0%) |
26507
(30.0%) |
2b. PDI annotation of text corpora
Table 3. Some example
PDI-tagged sentences from TIMIT
TIMIT
sentence |
phonetic
transcription |
PDI coding |
mean PDI |
word # |
global PDI |
Theocracy reconsidered |
/TI'0kr@sI ,rik@n'sId@d/ |
dJM1 JNQ13 |
4.5 |
2 |
9 |
There were other farmhouses nearby |
D7 w9R 'VD@ 'fAmh2zIz 'n6b1 |
ABL1 AK1 AEJL1 agNV1 C1 |
3.8 |
5 |
19 |
We can die, too, we can die like real people |
wi k&n d1 tu wi k&n d1 l1k r6l 'pipl |
* * * * * * * * C dX |
0.3 |
10 |
3 |
3. PDI annotation of MEDAL definitions
Table 4. An example of PDI-tagged MEDAL
definition (taster)
Definition |
a small amount of something that is offered
so that you can experience it and decide whether you like it or not |
Transcription |
@ smOl @'m2nt 0v 'sVmTIN D&t Iz '0f@d s5 D&t ju k&n
Ik'sp6r6ns It &nd dI's1d 'weD@ ju l1k It O n0t |
PDI codes |
J 1 gJ N1 EHM L N JN1 * L g * C * N NO AJL1 g * * A1 1 |
Mean PDI |
1.3 |
Number of words |
22 |
Global PDI |
28 |
The mean word-weighted PDI counted over the
88495 MEDAL definitions equals 1.52, s.d.=0.42. I tentatively compared the MEDAL's mean with a randomly selected
short text (1698 words) downloaded from the internet: the latter's mean PDI was
1.92.
4a. Applications: phonolexicographic analysis
Table 5. Cross-dictionary comparisons
|
COBUILD3 |
LDOCE4 |
OALD7 |
CALD=CIDE2 |
MEDAL |
definition sample (N=433) |
85 |
88 |
90 |
83 |
87 |
mean PDI (per word) |
1.4 |
1.5 |
1.5 |
1.5 |
1.5 |
mean PDI (per definition) |
28 |
21 |
22 |
22 |
21 |
mean # words |
19 |
14 |
15 |
14 |
14 |
Definition
phonetic difficulty
A word
should be defined using words simpler than itself (Ayto 1984). 12 headwords in the MEDAL sample of 87 show
definition PDI at least 1 point greater than the PDI of the headword: candy,
foot, grease, intensity, keel, mail, oozy, recess, requisite, snip, tramp, vaccinate. One definition's PDI exceeds headword PDI by
more than 2 points: necessary for a particular purpose.
Ayto,J.R. 1984. "The
vocabulary of definition". In D.Goetz & T.Herbst
(eds). 1984. Theoretische und praktische Probleme der Lexicographie.
München: Max Hueber Verlag. 50-60
4b. Applications: didactic
·
dynamically
adjusting definitions to the learner's needs and requirements, also in terms of
pronunciation
· (semi)automatic creation of language
tasks and exercises in an electronic dictionary
· offering the user a corpus-like
resource within the dictionary
MEDAL
phonolapsological query examples:
1. /t+j/
coalescence; PDI<0.6:
bedroom: a
room that you sleep in
cone: a cone shape that you put ice cream in and eat
green: not yet ready to be eaten
payphone: a telephone in a public place that you pay to use
2.
Linking /r/; PDI<0.6
chasm/crevasse:
a very deep crack in rock or ice
exactly: in every way or every detail
intense: very great or extreme
severe/ly: very strict or extreme
to have one foot in the grave: to be very old or ill and
likely to die soon
3.
Schwa-less definitions; PDI<.06
creep by: if
time creeps by it passes very slowly
lean (adj): lean meat has very little fat in it
lean (n): meat that has very little fat in it
not a moment too soon: so late that it is almost too late
tied up: if traffic is tied up it is not moving very quickly
4.
Schwa-heavy definitions
client-server:
used for referring to a network (=group of computers) in which each computer
is either a client or a server. Clients are the individual computers that run
programs or the equipment connected to them such as printers, and servers are
the powerful computers that supply the information that makes them work (30
schwas in 51 words, PDI=2.0)