Link Archive
One of the objectives of
gitenberg is to provide a github-flavored pathway for the improvement of the metadata for
Project Gutenberg ebooks. This runs in two directions: . Improving the accessibility an
usability of PG metadata . Improving the quality and completeness of PG metadata
In corpus linguistics, a hapax
legomenon (/ˈhæpəks lɨˈɡɒmɨnɒn/ also /ˈhæpæks/ or /ˈheɪpæks/;[1][2] pl. hapax legomena;
sometimes abbreviated to hapax, pl. hapaxes) is a word that occurs only once within a
context, either in the written record of an entire language, in the works of an author, or
in a single text. The term is sometimes incorrectly used to describe a word that occurs in
just one of an author's works, even though it occurs more than once in that work. Hapax
legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "(something) said (only)
once".[3]
Rank-size distribution is the
distribution of size by rank, in decreasing order of size. For example, if a data set
consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5
(ranks 1 through 4). This is also known as the rank-frequency distribution, when the
source data are from a frequency distribution. These are particularly of interest when the
data vary significantly in scale, such as city size or word frequency. These distributions
frequently follow a power law distribution, or less well-known ones such as a stretched
exponential function or parabolic fractal distribution, at least approximately for certain
ranges of ranks; see below.