Página 1 dos resultados de 241 itens digitais encontrados em 0.005 segundos

Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Corral, Álvaro; Boleda, Gemma; Ferrer-i-Cancho, Ramon
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 09/07/2015 EN
Relevância na Pesquisa
46.9%
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf’s law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf’s law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf’s law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.

The Evolution of the Exponent of Zipf's Law in Language Ontogeny

Baixeries, Jaume; Elvevåg, Brita; Ferrer-i-Cancho, Ramon
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 13/03/2013 EN
Relevância na Pesquisa
46.95%
It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.

Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

Jiang, Bin; Jia, Tao
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
This paper provides a new geospatial perspective on whether or not Zipf's law holds for all cities or for the largest cities in the United States using a massive dataset and its computing. A major problem around this issue is how to define cities or city boundaries. Most of the investigations of Zipf's law rely on the demarcations of cities imposed by census data, e.g., metropolitan areas and census-designated places. These demarcations or definitions (of cities) are criticized for being subjective or even arbitrary. Alternative solutions to defining cities are suggested, but they still rely on census data for their definitions. In this paper we demarcate urban agglomerations by clustering street nodes (including intersections and ends), forming what we call natural cities. Based on the demarcation, we found that Zipf's law holds remarkably well for all the natural cities (over 2-4 million in total) across the United States. There is little sensitivity for the holding with respect to the clustering resolution used for demarcating the natural cities. This is a big contrast to urban areas, as defined in the census data, which do not hold stable for Zipf's law. Keywords: Natural cities, power law, data-intensive geospatial computing...

Zipf's law and criticality in multivariate data without fine-tuning

Schwab, David J.; Nemenman, Ilya; Mehta, Pankaj
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.82%
The joint probability distribution of many degrees of freedom in biological systems, such as firing patterns in neural networks or antibody sequence composition in zebrafish, often follow Zipf's law, where a power law is observed on a rank-frequency plot. This behavior has recently been shown to imply that these systems reside near to a unique critical point where the extensive parts of the entropy and energy are exactly equal. Here we show analytically, and via numerical simulations, that Zipf-like probability distributions arise naturally if there is an unobserved variable (or variables) that affects the system, e. g. for neural networks an input stimulus that causes individual neurons in the network to fire at time-varying rates. In statistics and machine learning, these models are called latent-variable or mixture models. Our model shows that no fine-tuning is required, i.e. Zipf's law arises generically without tuning parameters to a point, and gives insight into the ubiquity of Zipf's law in a wide range of systems.; Comment: 5 pages, 3 figures

Zipf's law for word frequencies: word forms versus lemmas in long texts

Corral, Alvaro; Boleda, Gemma; Ferrer-i-Cancho, Ramon
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.9%
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. In order to have as homogeneous sources as possible, we analyze some of the longest literary texts ever written, comprising four different languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable.

Theory of Zipf's Law and of General Power Law Distributions with Gibrat's law of Proportional Growth

Saichev, A.; Malevergne, Y.; Sornette, D.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 13/08/2008
Relevância na Pesquisa
46.9%
We summarize a book under publication with his title written by the three present authors, on the theory of Zipf's law, and more generally of power laws, driven by the mechanism of proportional growth. The preprint is available upon request from the authors. For clarity, consistence of language and conciseness, we discuss the origin and conditions of the validity of Zipf's law using the terminology of firms' asset values. We use firms at the entities whose size distributions are to be explained. It should be noted, however, that most of the relations discussed in this book, especially the intimate connection between Zipf's and Gilbrat's laws, underlie Zipf's law in diverse scientific areas. The same models and variations thereof can be straightforwardly applied to any of the other domains of application.; Comment: 11 pages, 1 figure with 4 panels, summary of a book in press

Zipf's law and maximum sustainable growth

Malevergne, Y.; Saichev, A.; Sornette, D.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 01/12/2010
Relevância na Pesquisa
46.8%
Zipf's law states that the number of firms with size greater than S is inversely proportional to S. Most explanations start with Gibrat's rule of proportional growth but require additional constraints. We show that Gibrat's rule, at all firm levels, yields Zipf's law under a balance condition between the effective growth rate of incumbent firms (which includes their possible demise) and the growth rate of investments in entrant firms. Remarkably, Zipf's law is the signature of the long-term optimal allocation of resources that ensures the maximum sustainable growth rate of an economy.; Comment: 42 pages, 3 figures

Zipf's Law for All the Natural Cities around the World

Jiang, Bin; Yin, Junjun; Liu, Qingling
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
47.05%
Two fundamental issues surrounding research on Zipf's law regarding city sizes are whether and why this law holds. This paper does not deal with the latter issue with respect to why, and instead investigates whether Zipf's law holds in a global setting, thus involving all cities around the world. Unlike previous studies, which have mainly relied on conventional census data such as populations, and census-bureau-imposed definitions of cities, we adopt naturally (in terms of data speaks for itself) delineated cities, or natural cities, to be more precise, in order to examine Zipf's law. We find that Zipf's law holds remarkably well for all natural cities at the global level, and remains almost valid at the continental level except for Africa at certain time instants. We further examine the law at the country level, and note that Zipf's law is violated from country to country or from time to time. This violation is mainly due to our limitations; we are limited to individual countries, or to a static view on city-size distributions. The central argument of this paper is that Zipf's law is universal, and we therefore must use the correct scope in order to observe it. We further find Zipf's law applied to city numbers; the number of cities in the first largest country is twice as many as that in the second largest country...

Predicted and Verified Deviation from Zipf's Law in Growing Social Networks

Zhang, Qunzhi; Sornette, Didier
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 15/07/2010
Relevância na Pesquisa
46.9%
Zipf's power law is a general empirical regularity found in many natural and social systems. A recently developed theory predicts that Zipf's law corresponds to systems that are growing according to a maximally sustainable path in the presence of random proportional growth, stochastic birth and death processes. We report a detailed empirical analysis of a burgeoning network of social groups, in which all ingredients needed for Zipf's law to apply are verifiable and verified. We estimate empirically the average growth $r$ and its standard deviation $\sigma$ as well as the death rate $h$ and predict without adjustable parameters the exponent $\mu$ of the power law distribution $P(s)$ of the group sizes $s$. The predicted value $\mu = 0.75 \pm 0.05$ is in excellent agreement with maximum likelihood estimations. According to theory, the deviation of $P(s)$ from Zipf's law (i.e., $\mu < 1$) constitutes a direct statistical quantitative signature of the overall non-stationary growth of the social universe.; Comment: 4 pages, 2 figures, 2 tables

Zipf's law, Hierarchical Structure, and Shuffling-Cards Model for Urban Development

Chen, Yanguang
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 16/04/2011
Relevância na Pesquisa
46.82%
A new angle of view is proposed to find the simple rules dominating complex systems and regular patterns behind random phenomena such as cities. Hierarchy of cities reflects the ubiquitous structure frequently observed in the natural world and social institutions. Where there is a hierarchy with cascade structure, there is a rank-size distribution following Zipf's law, and vice versa. The hierarchical structure can be described with a set of exponential functions that are identical in form to Horton-Strahler's laws on rivers and Gutenberg-Richter's laws on earthquake energy. From the exponential models, we can derive four power laws such as Zipf's law indicative of fractals and scaling symmetry. Research on the hierarchy is revealing for us to understand how complex systems are self-organized. A card-shuffling model is built to interpret the relation between Zipf's law and hierarchy of cities. This model can be expanded to explain the general empirical power-law distributions across the individual physical and social sciences, which are hard to be comprehended within the specific scientific domains.; Comment: 28 pages, 8 figures

Zipf's Law in the Liquid Gas Phase Transition of Nuclei

Ma, Y. G.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.8%
Zipf's law in the field of linguistics is tested in the nuclear disassembly within the framework of isospin dependent lattice gas model. It is found that the average cluster charge (or mass) of rank $n$ in the charge (or mass) list shows exactly inversely to its rank, i.e., there exists Zipf's law, at the phase transition temperature. This novel criterion shall be helpful to search the nuclear liquid gas phase transition experimentally and theoretically. In addition, the finite size scaling of the effective phase transition temperature at which the Zipf's law appears is studied for several systems with different mass and the critical exponents of $\nu$ and $\beta$ are tentatively extracted.; Comment: 4 Pages, 4 Figures, ReVTEX; Some misprints are corrected

Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data

Li, Wentian
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/04/2001
Relevância na Pesquisa
46.97%
Microarray data consists of mRNA expression levels of thousands of genes under certain conditions. A difference in the expression level of a gene at two different conditions/phenotypes, such as cancerous versus non-cancerous, one subtype of cancer versus another, before versus after a drug treatment, is indicative of the relevance of that gene to the difference of the high-level phenotype. Each gene can be ranked by its ability to distinguish the two conditions. We study how the single-gene classification ability decreases with its rank (a Zipf's plot). Power-law function in the Zipf's plot is observed for the four microarray datasets obtained from various cancer studies. This power-law behavior in the Zipf's plot is reminiscent of similar power-law curves in other natural and social phenomena (Zipf's law). However, due to our choice of the measure of importance in classification ability, i.e., the maximized likelihood in a logistic regression, the exponent of the power-law function is a function of the sample size, instead of a fixed value close to 1 for a typical example of Zipf's law. The presence of this power-law behavior is important for deciding the number of genes to be used for a discriminant microarray data analysis.; Comment: 11 pages...

Zipf's law, 1/f noise, and fractal hierarchy

Chen, Yanguang
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 22/04/2011
Relevância na Pesquisa
46.95%
Fractals, 1/f noise, Zipf's law, and the occurrence of large catastrophic events are typical ubiquitous general empirical observations across the individual sciences which cannot be understood within the set of references developed within the specific scientific domains. All these observations are associated with scaling laws and have caused a broad research interest in the scientific circle. However, the inherent relationships between these scaling phenomena are still pending questions remaining to be researched. In this paper, theoretical derivation and mathematical experiments are employed to reveal the analogy between fractal patterns, 1/f noise, and the Zipf distribution. First, the multifractal process follows the generalized Zipf's law empirically. Second, a 1/f spectrum is identical in mathematical form to Zipf's law. Third, both 1/f spectra and Zipf's law can be converted into a self-similar hierarchy. Fourth, fractals, 1/f spectra, Zipf's law, and the occurrence of large catastrophic events can be described with similar exponential laws and power laws. The self-similar hierarchy is a more general framework or structure which can be used to encompass or unify different scaling phenomena and rules in both physical and social systems such as cities...

Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

Lu, Linyuan; Zhang, Zi-Ke; Zhou, Tao
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
47.05%
Background: Zipf's law and Heaps' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation. Methodology/Principal Findings: We show that the Heaps' law can be considered as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we refine the known approximate solution of the Heaps' exponent provided the Zipf's exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf's and Heaps' exponents. Conclusions/Significance: The present analysis provides a clear picture about the relation between the Zipf's law and Heaps' law without the help of any specific stochastic model, namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The presented numerical method gives considerably better estimation of the Heaps' exponent given the Zipf's exponent and the system size. Our analysis provides some insights and implications of real complex systems...

Empirical Tests of Zipf's law Mechanism In Open Source Linux Distribution

Maillart, T.; Sornette, D.; Spaeth, S.; Von Krogh, G.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/06/2008
Relevância na Pesquisa
46.9%
The evolution of open source software projects in Linux distributions offers a remarkable example of a growing complex self-organizing adaptive system, exhibiting Zipf's law over four full decades. We present three tests of the usually assumed ingredients of stochastic growth models that have been previously conjectured to be at the origin of Zipf's law: (i) the growth observed between successive releases of the number of in-directed links of packages obeys Gibrat's law of proportional growth; (ii) the average growth increment of the number of in-directed links of packages over a time interval $\Delta t$ is proportional to $\Delta t$, while its standard deviation is proportional to $\sqrt{\Delta t}$; (iii) the distribution of the number of in-directed links of new packages appearing in evolving versions of Debian Linux distributions has a tail thinner than Zipf's law, with an exponent which converges to the Zipf's law value 1 as the time $\Delta t$ between releases increases.; Comment: 4 pages and 4 figures

Recursive Subdivision of Urban Space and Zipf's law

Chen, Yanguang; Wang, Jiejing
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 25/09/2012
Relevância na Pesquisa
46.88%
Zipf's law can be used to describe the rank-size distribution of cities in a region. It was seldom employed to research urban internal structure. In this paper, we demonstrate that the space-filling process within a city follows Zipf's law and can be characterized with the rank-size rule. A model of spatial disaggregation of urban space is presented to depict the spatial regularity of urban growth. By recursive subdivision of space, an urban region can be geometrically divided into two parts, four parts, eight parts, and so on, and form a hierarchy with cascade structure. If we rank these parts by size, the portions will conform to the Zipf distribution. By means of GIS technique and remote sensing data, the model of recursive subdivision of urban space is applied to three cities of China. The results show that the intra-urban hierarchy complies with Zipf's law, and the values of the rank-size scaling exponent are very close to 1. The significance of this study lies in three aspects. First, it shows that the strict subdivision of space is an efficient approach to revealing spatial order of urban form. Second, it discloses the relationships between urban space-filling process and the rank-size rule. Third, it suggests a new way of understanding fractals...

Zipf's law arises naturally in structured, high-dimensional data

Aitchison, Laurence; Corradi, Nicola; Latham, Peter E.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.99%
Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many different domains. Although there are models that explain Zipf's law in each of them, there is not yet a general mechanism that covers all, or even most, domains. Here we propose such a mechanism. It relies on the observation that real world data is often generated from some underlying, often low dimensional, causes - low dimensional latent variables. Those latent variables mix together multiple models that do not obey Zipf's law, giving a model that does obey Zipf's law. In particular, we show that when observations are high dimensional, latent variable models lead to Zipf's law under very mild conditions - conditions that are typically satisfied for real world data. We identify an underlying latent variable for language, neural data, and amino acid sequences, and we speculate that yet to be uncovered latent variables are responsible for Zipf's law in other domains.

Large-scale analysis of Zipf's law in English texts

Moreno-Sánchez, Isabel; Font-Clos, Francesc; Corral, Álvaro
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 15/09/2015
Relevância na Pesquisa
46.9%
Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf's law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf's law and fitting them to all available English texts in the Project Gutenberg database (consisting of more than 30000 texts). To do so we use state-of-the art tools in fitting and goodness-of-fit tests, carefully tailored to the peculiarities of text statistics. Remarkably, one of the three versions of Zipf's law, consisting of a pure power-law form in the complementary cumulative distribution function of word frequencies, is able to fit more than 40% of the texts in the database (at the 0.05 significance level), for the whole domain of frequencies (from 1 to the maximum value) and with only one free parameter (the exponent).

Maximal nonsymmetric entropy leads naturally to Zipf's law

Liu, Chengshi
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 14/09/2006
Relevância na Pesquisa
46.82%
As the most fundamental empirical law, Zipf's law has been studied from many aspects. But its meaning is still an open problem. Some models have been constructed to explain Zipf's law. In the letter, a new concept named nonsymmetric entropy was introduced, maximizing nonsymmetric entropy leads naturally to Zipf's law.; Comment: 3 pages

Zipf’s law for cities : an empirical examination

Ioannides, Yannis Menelaos; Overman, Henry G.
Fonte: London School of Economics and Political Science Research Publicador: London School of Economics and Political Science Research
Tipo: Article; PeerReviewed Formato: application/pdf
Publicado em /03/2003 EN; EN
Relevância na Pesquisa
46.86%
We use data for metro areas in the United States, from the US Census for 1900 - 1990, to test the validity of Zipf's Law for cities. Previous investigations are restricted to regressions of log size against log rank. In contrast, we use a nonparametric procedure to calculate local Zipf exponents from the mean and variance of city growth rates. This also allows us to test for the validity of Gibrat's Law for city growth processes. Despite variation in growth rates as a function of city size, Gibrat's Law does hold. In addition the local Zipf exponents are broadly consistent with Zipf's Law. Deviations from Zipf's Law are easily explained by deviations from Gibrat's Law.