Página 1 dos resultados de 241 itens digitais encontrados em 0.004 segundos

## Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Corral, Álvaro; Boleda, Gemma; Ferrer-i-Cancho, Ramon
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf’s law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf’s law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf’s law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.

## The Evolution of the Exponent of Zipf's Law in Language Ontogeny

Baixeries, Jaume; Elvevåg, Brita; Ferrer-i-Cancho, Ramon
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.95%
It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.

## Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

Jiang, Bin; Jia, Tao
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
This paper provides a new geospatial perspective on whether or not Zipf's law holds for all cities or for the largest cities in the United States using a massive dataset and its computing. A major problem around this issue is how to define cities or city boundaries. Most of the investigations of Zipf's law rely on the demarcations of cities imposed by census data, e.g., metropolitan areas and census-designated places. These demarcations or definitions (of cities) are criticized for being subjective or even arbitrary. Alternative solutions to defining cities are suggested, but they still rely on census data for their definitions. In this paper we demarcate urban agglomerations by clustering street nodes (including intersections and ends), forming what we call natural cities. Based on the demarcation, we found that Zipf's law holds remarkably well for all the natural cities (over 2-4 million in total) across the United States. There is little sensitivity for the holding with respect to the clustering resolution used for demarcating the natural cities. This is a big contrast to urban areas, as defined in the census data, which do not hold stable for Zipf's law. Keywords: Natural cities, power law, data-intensive geospatial computing...

## Zipf's law and criticality in multivariate data without fine-tuning

Schwab, David J.; Nemenman, Ilya; Mehta, Pankaj
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.82%
The joint probability distribution of many degrees of freedom in biological systems, such as firing patterns in neural networks or antibody sequence composition in zebrafish, often follow Zipf's law, where a power law is observed on a rank-frequency plot. This behavior has recently been shown to imply that these systems reside near to a unique critical point where the extensive parts of the entropy and energy are exactly equal. Here we show analytically, and via numerical simulations, that Zipf-like probability distributions arise naturally if there is an unobserved variable (or variables) that affects the system, e. g. for neural networks an input stimulus that causes individual neurons in the network to fire at time-varying rates. In statistics and machine learning, these models are called latent-variable or mixture models. Our model shows that no fine-tuning is required, i.e. Zipf's law arises generically without tuning parameters to a point, and gives insight into the ubiquity of Zipf's law in a wide range of systems.; Comment: 5 pages, 3 figures

## Zipf's law for word frequencies: word forms versus lemmas in long texts

Corral, Alvaro; Boleda, Gemma; Ferrer-i-Cancho, Ramon
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. In order to have as homogeneous sources as possible, we analyze some of the longest literary texts ever written, comprising four different languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable.

## Theory of Zipf's Law and of General Power Law Distributions with Gibrat's law of Proportional Growth

Saichev, A.; Malevergne, Y.; Sornette, D.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
We summarize a book under publication with his title written by the three present authors, on the theory of Zipf's law, and more generally of power laws, driven by the mechanism of proportional growth. The preprint is available upon request from the authors. For clarity, consistence of language and conciseness, we discuss the origin and conditions of the validity of Zipf's law using the terminology of firms' asset values. We use firms at the entities whose size distributions are to be explained. It should be noted, however, that most of the relations discussed in this book, especially the intimate connection between Zipf's and Gilbrat's laws, underlie Zipf's law in diverse scientific areas. The same models and variations thereof can be straightforwardly applied to any of the other domains of application.; Comment: 11 pages, 1 figure with 4 panels, summary of a book in press

## Zipf's law and maximum sustainable growth

Malevergne, Y.; Saichev, A.; Sornette, D.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.8%
Zipf's law states that the number of firms with size greater than S is inversely proportional to S. Most explanations start with Gibrat's rule of proportional growth but require additional constraints. We show that Gibrat's rule, at all firm levels, yields Zipf's law under a balance condition between the effective growth rate of incumbent firms (which includes their possible demise) and the growth rate of investments in entrant firms. Remarkably, Zipf's law is the signature of the long-term optimal allocation of resources that ensures the maximum sustainable growth rate of an economy.; Comment: 42 pages, 3 figures

## Zipf's Law for All the Natural Cities around the World

Jiang, Bin; Yin, Junjun; Liu, Qingling
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
47.05%
Two fundamental issues surrounding research on Zipf's law regarding city sizes are whether and why this law holds. This paper does not deal with the latter issue with respect to why, and instead investigates whether Zipf's law holds in a global setting, thus involving all cities around the world. Unlike previous studies, which have mainly relied on conventional census data such as populations, and census-bureau-imposed definitions of cities, we adopt naturally (in terms of data speaks for itself) delineated cities, or natural cities, to be more precise, in order to examine Zipf's law. We find that Zipf's law holds remarkably well for all natural cities at the global level, and remains almost valid at the continental level except for Africa at certain time instants. We further examine the law at the country level, and note that Zipf's law is violated from country to country or from time to time. This violation is mainly due to our limitations; we are limited to individual countries, or to a static view on city-size distributions. The central argument of this paper is that Zipf's law is universal, and we therefore must use the correct scope in order to observe it. We further find Zipf's law applied to city numbers; the number of cities in the first largest country is twice as many as that in the second largest country...

## Predicted and Verified Deviation from Zipf's Law in Growing Social Networks

Zhang, Qunzhi; Sornette, Didier
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
Zipf's power law is a general empirical regularity found in many natural and social systems. A recently developed theory predicts that Zipf's law corresponds to systems that are growing according to a maximally sustainable path in the presence of random proportional growth, stochastic birth and death processes. We report a detailed empirical analysis of a burgeoning network of social groups, in which all ingredients needed for Zipf's law to apply are verifiable and verified. We estimate empirically the average growth $r$ and its standard deviation $\sigma$ as well as the death rate $h$ and predict without adjustable parameters the exponent $\mu$ of the power law distribution $P(s)$ of the group sizes $s$. The predicted value $\mu = 0.75 \pm 0.05$ is in excellent agreement with maximum likelihood estimations. According to theory, the deviation of $P(s)$ from Zipf's law (i.e., $\mu < 1$) constitutes a direct statistical quantitative signature of the overall non-stationary growth of the social universe.; Comment: 4 pages, 2 figures, 2 tables

## Zipf's law, Hierarchical Structure, and Shuffling-Cards Model for Urban Development

Chen, Yanguang
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.82%
A new angle of view is proposed to find the simple rules dominating complex systems and regular patterns behind random phenomena such as cities. Hierarchy of cities reflects the ubiquitous structure frequently observed in the natural world and social institutions. Where there is a hierarchy with cascade structure, there is a rank-size distribution following Zipf's law, and vice versa. The hierarchical structure can be described with a set of exponential functions that are identical in form to Horton-Strahler's laws on rivers and Gutenberg-Richter's laws on earthquake energy. From the exponential models, we can derive four power laws such as Zipf's law indicative of fractals and scaling symmetry. Research on the hierarchy is revealing for us to understand how complex systems are self-organized. A card-shuffling model is built to interpret the relation between Zipf's law and hierarchy of cities. This model can be expanded to explain the general empirical power-law distributions across the individual physical and social sciences, which are hard to be comprehended within the specific scientific domains.; Comment: 28 pages, 8 figures

## Zipf's Law in the Liquid Gas Phase Transition of Nuclei

Ma, Y. G.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.8%
Zipf's law in the field of linguistics is tested in the nuclear disassembly within the framework of isospin dependent lattice gas model. It is found that the average cluster charge (or mass) of rank $n$ in the charge (or mass) list shows exactly inversely to its rank, i.e., there exists Zipf's law, at the phase transition temperature. This novel criterion shall be helpful to search the nuclear liquid gas phase transition experimentally and theoretically. In addition, the finite size scaling of the effective phase transition temperature at which the Zipf's law appears is studied for several systems with different mass and the critical exponents of $\nu$ and $\beta$ are tentatively extracted.; Comment: 4 Pages, 4 Figures, ReVTEX; Some misprints are corrected

## Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data

Li, Wentian
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.97%
Microarray data consists of mRNA expression levels of thousands of genes under certain conditions. A difference in the expression level of a gene at two different conditions/phenotypes, such as cancerous versus non-cancerous, one subtype of cancer versus another, before versus after a drug treatment, is indicative of the relevance of that gene to the difference of the high-level phenotype. Each gene can be ranked by its ability to distinguish the two conditions. We study how the single-gene classification ability decreases with its rank (a Zipf's plot). Power-law function in the Zipf's plot is observed for the four microarray datasets obtained from various cancer studies. This power-law behavior in the Zipf's plot is reminiscent of similar power-law curves in other natural and social phenomena (Zipf's law). However, due to our choice of the measure of importance in classification ability, i.e., the maximized likelihood in a logistic regression, the exponent of the power-law function is a function of the sample size, instead of a fixed value close to 1 for a typical example of Zipf's law. The presence of this power-law behavior is important for deciding the number of genes to be used for a discriminant microarray data analysis.; Comment: 11 pages...

## Zipf's law, 1/f noise, and fractal hierarchy

Chen, Yanguang
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.95%
Fractals, 1/f noise, Zipf's law, and the occurrence of large catastrophic events are typical ubiquitous general empirical observations across the individual sciences which cannot be understood within the set of references developed within the specific scientific domains. All these observations are associated with scaling laws and have caused a broad research interest in the scientific circle. However, the inherent relationships between these scaling phenomena are still pending questions remaining to be researched. In this paper, theoretical derivation and mathematical experiments are employed to reveal the analogy between fractal patterns, 1/f noise, and the Zipf distribution. First, the multifractal process follows the generalized Zipf's law empirically. Second, a 1/f spectrum is identical in mathematical form to Zipf's law. Third, both 1/f spectra and Zipf's law can be converted into a self-similar hierarchy. Fourth, fractals, 1/f spectra, Zipf's law, and the occurrence of large catastrophic events can be described with similar exponential laws and power laws. The self-similar hierarchy is a more general framework or structure which can be used to encompass or unify different scaling phenomena and rules in both physical and social systems such as cities...

## Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

Lu, Linyuan; Zhang, Zi-Ke; Zhou, Tao
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
47.05%
Background: Zipf's law and Heaps' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation. Methodology/Principal Findings: We show that the Heaps' law can be considered as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we refine the known approximate solution of the Heaps' exponent provided the Zipf's exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf's and Heaps' exponents. Conclusions/Significance: The present analysis provides a clear picture about the relation between the Zipf's law and Heaps' law without the help of any specific stochastic model, namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The presented numerical method gives considerably better estimation of the Heaps' exponent given the Zipf's exponent and the system size. Our analysis provides some insights and implications of real complex systems...

## Empirical Tests of Zipf's law Mechanism In Open Source Linux Distribution

Maillart, T.; Sornette, D.; Spaeth, S.; Von Krogh, G.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
The evolution of open source software projects in Linux distributions offers a remarkable example of a growing complex self-organizing adaptive system, exhibiting Zipf's law over four full decades. We present three tests of the usually assumed ingredients of stochastic growth models that have been previously conjectured to be at the origin of Zipf's law: (i) the growth observed between successive releases of the number of in-directed links of packages obeys Gibrat's law of proportional growth; (ii) the average growth increment of the number of in-directed links of packages over a time interval $\Delta t$ is proportional to $\Delta t$, while its standard deviation is proportional to $\sqrt{\Delta t}$; (iii) the distribution of the number of in-directed links of new packages appearing in evolving versions of Debian Linux distributions has a tail thinner than Zipf's law, with an exponent which converges to the Zipf's law value 1 as the time $\Delta t$ between releases increases.; Comment: 4 pages and 4 figures

## Recursive Subdivision of Urban Space and Zipf's law

Chen, Yanguang; Wang, Jiejing
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.88%
Zipf's law can be used to describe the rank-size distribution of cities in a region. It was seldom employed to research urban internal structure. In this paper, we demonstrate that the space-filling process within a city follows Zipf's law and can be characterized with the rank-size rule. A model of spatial disaggregation of urban space is presented to depict the spatial regularity of urban growth. By recursive subdivision of space, an urban region can be geometrically divided into two parts, four parts, eight parts, and so on, and form a hierarchy with cascade structure. If we rank these parts by size, the portions will conform to the Zipf distribution. By means of GIS technique and remote sensing data, the model of recursive subdivision of urban space is applied to three cities of China. The results show that the intra-urban hierarchy complies with Zipf's law, and the values of the rank-size scaling exponent are very close to 1. The significance of this study lies in three aspects. First, it shows that the strict subdivision of space is an efficient approach to revealing spatial order of urban form. Second, it discloses the relationships between urban space-filling process and the rank-size rule. Third, it suggests a new way of understanding fractals...

## Zipf's law arises naturally in structured, high-dimensional data

Aitchison, Laurence; Corradi, Nicola; Latham, Peter E.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.99%
Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many different domains. Although there are models that explain Zipf's law in each of them, there is not yet a general mechanism that covers all, or even most, domains. Here we propose such a mechanism. It relies on the observation that real world data is often generated from some underlying, often low dimensional, causes - low dimensional latent variables. Those latent variables mix together multiple models that do not obey Zipf's law, giving a model that does obey Zipf's law. In particular, we show that when observations are high dimensional, latent variable models lead to Zipf's law under very mild conditions - conditions that are typically satisfied for real world data. We identify an underlying latent variable for language, neural data, and amino acid sequences, and we speculate that yet to be uncovered latent variables are responsible for Zipf's law in other domains.

## Large-scale analysis of Zipf's law in English texts

Moreno-Sánchez, Isabel; Font-Clos, Francesc; Corral, Álvaro
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf's law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf's law and fitting them to all available English texts in the Project Gutenberg database (consisting of more than 30000 texts). To do so we use state-of-the art tools in fitting and goodness-of-fit tests, carefully tailored to the peculiarities of text statistics. Remarkably, one of the three versions of Zipf's law, consisting of a pure power-law form in the complementary cumulative distribution function of word frequencies, is able to fit more than 40% of the texts in the database (at the 0.05 significance level), for the whole domain of frequencies (from 1 to the maximum value) and with only one free parameter (the exponent).

## Maximal nonsymmetric entropy leads naturally to Zipf's law

Liu, Chengshi
Tipo: Artigo de Revista Científica