Página 1 dos resultados de 241 itens digitais encontrados em 0.005 segundos

## Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Fonte: Public Library of Science
Publicador: Public Library of Science

Tipo: Artigo de Revista Científica

Publicado em 09/07/2015
EN

Relevância na Pesquisa

46.9%

Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf’s law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf’s law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf’s law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.

Link permanente para citações:

## The Evolution of the Exponent of Zipf's Law in Language Ontogeny

Fonte: Public Library of Science
Publicador: Public Library of Science

Tipo: Artigo de Revista Científica

Publicado em 13/03/2013
EN

Relevância na Pesquisa

46.95%

It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.

Link permanente para citações:

## Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

46.9%

This paper provides a new geospatial perspective on whether or not Zipf's law
holds for all cities or for the largest cities in the United States using a
massive dataset and its computing. A major problem around this issue is how to
define cities or city boundaries. Most of the investigations of Zipf's law rely
on the demarcations of cities imposed by census data, e.g., metropolitan areas
and census-designated places. These demarcations or definitions (of cities) are
criticized for being subjective or even arbitrary. Alternative solutions to
defining cities are suggested, but they still rely on census data for their
definitions. In this paper we demarcate urban agglomerations by clustering
street nodes (including intersections and ends), forming what we call natural
cities. Based on the demarcation, we found that Zipf's law holds remarkably
well for all the natural cities (over 2-4 million in total) across the United
States. There is little sensitivity for the holding with respect to the
clustering resolution used for demarcating the natural cities. This is a big
contrast to urban areas, as defined in the census data, which do not hold
stable for Zipf's law.
Keywords: Natural cities, power law, data-intensive geospatial computing...

Link permanente para citações:

## Zipf's law and criticality in multivariate data without fine-tuning

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

46.82%

#Quantitative Biology - Neurons and Cognition#Condensed Matter - Statistical Mechanics#Quantitative Biology - Quantitative Methods

The joint probability distribution of many degrees of freedom in biological
systems, such as firing patterns in neural networks or antibody sequence
composition in zebrafish, often follow Zipf's law, where a power law is
observed on a rank-frequency plot. This behavior has recently been shown to
imply that these systems reside near to a unique critical point where the
extensive parts of the entropy and energy are exactly equal. Here we show
analytically, and via numerical simulations, that Zipf-like probability
distributions arise naturally if there is an unobserved variable (or variables)
that affects the system, e. g. for neural networks an input stimulus that
causes individual neurons in the network to fire at time-varying rates. In
statistics and machine learning, these models are called latent-variable or
mixture models. Our model shows that no fine-tuning is required, i.e. Zipf's
law arises generically without tuning parameters to a point, and gives insight
into the ubiquity of Zipf's law in a wide range of systems.; Comment: 5 pages, 3 figures

Link permanente para citações:

## Zipf's law for word frequencies: word forms versus lemmas in long texts

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

46.9%

#Physics - Physics and Society#Computer Science - Computation and Language#Physics - Data Analysis, Statistics and Probability

Zipf's law is a fundamental paradigm in the statistics of written and spoken
natural language as well as in other communication systems. We raise the
question of the elementary units for which Zipf's law should hold in the most
natural way, studying its validity for plain word forms and for the
corresponding lemma forms. In order to have as homogeneous sources as possible,
we analyze some of the longest literary texts ever written, comprising four
different languages, with different levels of morphological complexity. In all
cases Zipf's law is fulfilled, in the sense that a power-law distribution of
word or lemma frequencies is valid for several orders of magnitude. We
investigate the extent to which the word-lemma transformation preserves two
parameters of Zipf's law: the exponent and the low-frequency cut-off. We are
not able to demonstrate a strict invariance of the tail, as for a few texts
both exponents deviate significantly, but we conclude that the exponents are
very similar, despite the remarkable transformation that going from words to
lemmas represents, considerably affecting all ranges of frequencies. In
contrast, the low-frequency cut-offs are less stable.

Link permanente para citações:

## Theory of Zipf's Law and of General Power Law Distributions with Gibrat's law of Proportional Growth

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 13/08/2008

Relevância na Pesquisa

46.9%

#Quantitative Finance - General Finance#Physics - Data Analysis, Statistics and Probability#Physics - Physics and Society

We summarize a book under publication with his title written by the three
present authors, on the theory of Zipf's law, and more generally of power laws,
driven by the mechanism of proportional growth. The preprint is available upon
request from the authors.
For clarity, consistence of language and conciseness, we discuss the origin
and conditions of the validity of Zipf's law using the terminology of firms'
asset values. We use firms at the entities whose size distributions are to be
explained. It should be noted, however, that most of the relations discussed in
this book, especially the intimate connection between Zipf's and Gilbrat's
laws, underlie Zipf's law in diverse scientific areas. The same models and
variations thereof can be straightforwardly applied to any of the other domains
of application.; Comment: 11 pages, 1 figure with 4 panels, summary of a book in press

Link permanente para citações:

## Zipf's law and maximum sustainable growth

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 01/12/2010

Relevância na Pesquisa

46.8%

Zipf's law states that the number of firms with size greater than S is
inversely proportional to S. Most explanations start with Gibrat's rule of
proportional growth but require additional constraints. We show that Gibrat's
rule, at all firm levels, yields Zipf's law under a balance condition between
the effective growth rate of incumbent firms (which includes their possible
demise) and the growth rate of investments in entrant firms. Remarkably, Zipf's
law is the signature of the long-term optimal allocation of resources that
ensures the maximum sustainable growth rate of an economy.; Comment: 42 pages, 3 figures

Link permanente para citações:

## Zipf's Law for All the Natural Cities around the World

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

47.05%

Two fundamental issues surrounding research on Zipf's law regarding city
sizes are whether and why this law holds. This paper does not deal with the
latter issue with respect to why, and instead investigates whether Zipf's law
holds in a global setting, thus involving all cities around the world. Unlike
previous studies, which have mainly relied on conventional census data such as
populations, and census-bureau-imposed definitions of cities, we adopt
naturally (in terms of data speaks for itself) delineated cities, or natural
cities, to be more precise, in order to examine Zipf's law. We find that Zipf's
law holds remarkably well for all natural cities at the global level, and
remains almost valid at the continental level except for Africa at certain time
instants. We further examine the law at the country level, and note that Zipf's
law is violated from country to country or from time to time. This violation is
mainly due to our limitations; we are limited to individual countries, or to a
static view on city-size distributions. The central argument of this paper is
that Zipf's law is universal, and we therefore must use the correct scope in
order to observe it. We further find Zipf's law applied to city numbers; the
number of cities in the first largest country is twice as many as that in the
second largest country...

Link permanente para citações:

## Predicted and Verified Deviation from Zipf's Law in Growing Social Networks

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 15/07/2010

Relevância na Pesquisa

46.9%

Zipf's power law is a general empirical regularity found in many natural and
social systems. A recently developed theory predicts that Zipf's law
corresponds to systems that are growing according to a maximally sustainable
path in the presence of random proportional growth, stochastic birth and death
processes. We report a detailed empirical analysis of a burgeoning network of
social groups, in which all ingredients needed for Zipf's law to apply are
verifiable and verified. We estimate empirically the average growth $r$ and its
standard deviation $\sigma$ as well as the death rate $h$ and predict without
adjustable parameters the exponent $\mu$ of the power law distribution $P(s)$
of the group sizes $s$. The predicted value $\mu = 0.75 \pm 0.05$ is in
excellent agreement with maximum likelihood estimations. According to theory,
the deviation of $P(s)$ from Zipf's law (i.e., $\mu < 1$) constitutes a direct
statistical quantitative signature of the overall non-stationary growth of the
social universe.; Comment: 4 pages, 2 figures, 2 tables

Link permanente para citações:

## Zipf's law, Hierarchical Structure, and Shuffling-Cards Model for Urban Development

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 16/04/2011

Relevância na Pesquisa

46.82%

A new angle of view is proposed to find the simple rules dominating complex
systems and regular patterns behind random phenomena such as cities. Hierarchy
of cities reflects the ubiquitous structure frequently observed in the natural
world and social institutions. Where there is a hierarchy with cascade
structure, there is a rank-size distribution following Zipf's law, and vice
versa. The hierarchical structure can be described with a set of exponential
functions that are identical in form to Horton-Strahler's laws on rivers and
Gutenberg-Richter's laws on earthquake energy. From the exponential models, we
can derive four power laws such as Zipf's law indicative of fractals and
scaling symmetry. Research on the hierarchy is revealing for us to understand
how complex systems are self-organized. A card-shuffling model is built to
interpret the relation between Zipf's law and hierarchy of cities. This model
can be expanded to explain the general empirical power-law distributions across
the individual physical and social sciences, which are hard to be comprehended
within the specific scientific domains.; Comment: 28 pages, 8 figures

Link permanente para citações:

## Zipf's Law in the Liquid Gas Phase Transition of Nuclei

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

46.8%

Zipf's law in the field of linguistics is tested in the nuclear disassembly
within the framework of isospin dependent lattice gas model. It is found that
the average cluster charge (or mass) of rank $n$ in the charge (or mass) list
shows exactly inversely to its rank, i.e., there exists Zipf's law, at the
phase transition temperature. This novel criterion shall be helpful to search
the nuclear liquid gas phase transition experimentally and theoretically. In
addition, the finite size scaling of the effective phase transition temperature
at which the Zipf's law appears is studied for several systems with different
mass and the critical exponents of $\nu$ and $\beta$ are tentatively extracted.; Comment: 4 Pages, 4 Figures, ReVTEX; Some misprints are corrected

Link permanente para citações:

## Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 05/04/2001

Relevância na Pesquisa

46.97%

#Physics - Biological Physics#Physics - Data Analysis, Statistics and Probability#Quantitative Biology - Quantitative Methods

Microarray data consists of mRNA expression levels of thousands of genes
under certain conditions. A difference in the expression level of a gene at two
different conditions/phenotypes, such as cancerous versus non-cancerous, one
subtype of cancer versus another, before versus after a drug treatment, is
indicative of the relevance of that gene to the difference of the high-level
phenotype. Each gene can be ranked by its ability to distinguish the two
conditions. We study how the single-gene classification ability decreases with
its rank (a Zipf's plot). Power-law function in the Zipf's plot is observed for
the four microarray datasets obtained from various cancer studies. This
power-law behavior in the Zipf's plot is reminiscent of similar power-law
curves in other natural and social phenomena (Zipf's law). However, due to our
choice of the measure of importance in classification ability, i.e., the
maximized likelihood in a logistic regression, the exponent of the power-law
function is a function of the sample size, instead of a fixed value close to 1
for a typical example of Zipf's law. The presence of this power-law behavior is
important for deciding the number of genes to be used for a discriminant
microarray data analysis.; Comment: 11 pages...

Link permanente para citações:

## Zipf's law, 1/f noise, and fractal hierarchy

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 22/04/2011

Relevância na Pesquisa

46.95%

Fractals, 1/f noise, Zipf's law, and the occurrence of large catastrophic
events are typical ubiquitous general empirical observations across the
individual sciences which cannot be understood within the set of references
developed within the specific scientific domains. All these observations are
associated with scaling laws and have caused a broad research interest in the
scientific circle. However, the inherent relationships between these scaling
phenomena are still pending questions remaining to be researched. In this
paper, theoretical derivation and mathematical experiments are employed to
reveal the analogy between fractal patterns, 1/f noise, and the Zipf
distribution. First, the multifractal process follows the generalized Zipf's
law empirically. Second, a 1/f spectrum is identical in mathematical form to
Zipf's law. Third, both 1/f spectra and Zipf's law can be converted into a
self-similar hierarchy. Fourth, fractals, 1/f spectra, Zipf's law, and the
occurrence of large catastrophic events can be described with similar
exponential laws and power laws. The self-similar hierarchy is a more general
framework or structure which can be used to encompass or unify different
scaling phenomena and rules in both physical and social systems such as cities...

Link permanente para citações:

## Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

47.05%

Background: Zipf's law and Heaps' law are observed in disparate complex
systems. Of particular interests, these two laws often appear together. Many
theoretical models and analyses are performed to understand their co-occurrence
in real systems, but it still lacks a clear picture about their relation.
Methodology/Principal Findings: We show that the Heaps' law can be considered
as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we
refine the known approximate solution of the Heaps' exponent provided the
Zipf's exponent. We show that the approximate solution is indeed an asymptotic
solution for infinite systems, while in the finite-size system the Heaps'
exponent is sensitive to the system size. Extensive empirical analysis on tens
of disparate systems demonstrates that our refined results can better capture
the relation between the Zipf's and Heaps' exponents. Conclusions/Significance:
The present analysis provides a clear picture about the relation between the
Zipf's law and Heaps' law without the help of any specific stochastic model,
namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The
presented numerical method gives considerably better estimation of the Heaps'
exponent given the Zipf's exponent and the system size. Our analysis provides
some insights and implications of real complex systems...

Link permanente para citações:

## Empirical Tests of Zipf's law Mechanism In Open Source Linux Distribution

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 30/06/2008

Relevância na Pesquisa

46.9%

The evolution of open source software projects in Linux distributions offers
a remarkable example of a growing complex self-organizing adaptive system,
exhibiting Zipf's law over four full decades. We present three tests of the
usually assumed ingredients of stochastic growth models that have been
previously conjectured to be at the origin of Zipf's law: (i) the growth
observed between successive releases of the number of in-directed links of
packages obeys Gibrat's law of proportional growth; (ii) the average growth
increment of the number of in-directed links of packages over a time interval
$\Delta t$ is proportional to $\Delta t$, while its standard deviation is
proportional to $\sqrt{\Delta t}$; (iii) the distribution of the number of
in-directed links of new packages appearing in evolving versions of Debian
Linux distributions has a tail thinner than Zipf's law, with an exponent which
converges to the Zipf's law value 1 as the time $\Delta t$ between releases
increases.; Comment: 4 pages and 4 figures

Link permanente para citações:

## Recursive Subdivision of Urban Space and Zipf's law

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 25/09/2012

Relevância na Pesquisa

46.88%

Zipf's law can be used to describe the rank-size distribution of cities in a
region. It was seldom employed to research urban internal structure. In this
paper, we demonstrate that the space-filling process within a city follows
Zipf's law and can be characterized with the rank-size rule. A model of spatial
disaggregation of urban space is presented to depict the spatial regularity of
urban growth. By recursive subdivision of space, an urban region can be
geometrically divided into two parts, four parts, eight parts, and so on, and
form a hierarchy with cascade structure. If we rank these parts by size, the
portions will conform to the Zipf distribution. By means of GIS technique and
remote sensing data, the model of recursive subdivision of urban space is
applied to three cities of China. The results show that the intra-urban
hierarchy complies with Zipf's law, and the values of the rank-size scaling
exponent are very close to 1. The significance of this study lies in three
aspects. First, it shows that the strict subdivision of space is an efficient
approach to revealing spatial order of urban form. Second, it discloses the
relationships between urban space-filling process and the rank-size rule.
Third, it suggests a new way of understanding fractals...

Link permanente para citações:

## Zipf's law arises naturally in structured, high-dimensional data

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Relevância na Pesquisa

46.99%

Zipf's law, which states that the probability of an observation is inversely
proportional to its rank, has been observed in many different domains. Although
there are models that explain Zipf's law in each of them, there is not yet a
general mechanism that covers all, or even most, domains. Here we propose such
a mechanism. It relies on the observation that real world data is often
generated from some underlying, often low dimensional, causes - low dimensional
latent variables. Those latent variables mix together multiple models that do
not obey Zipf's law, giving a model that does obey Zipf's law. In particular,
we show that when observations are high dimensional, latent variable models
lead to Zipf's law under very mild conditions - conditions that are typically
satisfied for real world data. We identify an underlying latent variable for
language, neural data, and amino acid sequences, and we speculate that yet to
be uncovered latent variables are responsible for Zipf's law in other domains.

Link permanente para citações:

## Large-scale analysis of Zipf's law in English texts

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 15/09/2015

Relevância na Pesquisa

46.9%

Despite being a paradigm of quantitative linguistics, Zipf's law for words
suffers from three main problems: its formulation is ambiguous, its validity
has not been tested rigorously from a statistical point of view, and it has not
been confronted to a representatively large number of texts. So, we can
summarize the current support of Zipf's law in texts as anecdotic.
We try to solve these issues by studying three different versions of Zipf's
law and fitting them to all available English texts in the Project Gutenberg
database (consisting of more than 30000 texts). To do so we use state-of-the
art tools in fitting and goodness-of-fit tests, carefully tailored to the
peculiarities of text statistics. Remarkably, one of the three versions of
Zipf's law, consisting of a pure power-law form in the complementary cumulative
distribution function of word frequencies, is able to fit more than 40% of the
texts in the database (at the 0.05 significance level), for the whole domain of
frequencies (from 1 to the maximum value) and with only one free parameter (the
exponent).

Link permanente para citações:

## Maximal nonsymmetric entropy leads naturally to Zipf's law

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 14/09/2006

Relevância na Pesquisa

46.82%

As the most fundamental empirical law, Zipf's law has been studied from many
aspects. But its meaning is still an open problem. Some models have been
constructed to explain Zipf's law. In the letter, a new concept named
nonsymmetric entropy was introduced, maximizing nonsymmetric entropy leads
naturally to Zipf's law.; Comment: 3 pages

Link permanente para citações:

## Zipf’s law for cities : an empirical examination

Fonte: London School of Economics and Political Science Research
Publicador: London School of Economics and Political Science Research

Tipo: Article; PeerReviewed
Formato: application/pdf

Publicado em /03/2003
EN; EN

Relevância na Pesquisa

46.86%

We use data for metro areas in the United States, from the US Census for 1900 - 1990, to test the validity of Zipf's Law for cities. Previous investigations are restricted to regressions of log size against log rank. In contrast, we use a nonparametric procedure to calculate local Zipf exponents from the mean and variance of city growth rates. This also allows us to test for the validity of Gibrat's Law for city growth processes. Despite variation in growth rates as a function of city size, Gibrat's Law does hold. In addition the local Zipf exponents are broadly consistent with Zipf's Law. Deviations from Zipf's Law are easily explained by deviations from Gibrat's Law.

Link permanente para citações: