Traditionally, a cluster is defined as a collection of homogeneous nodes interconnected by a single high performance communication technology. However, in some cases, cluster nodes may be organized into several partitions – subclusters – internally interconnected by one or more selected SAN technologies. In order to constitute a multi-networked cluster, sub-clusters must share a common SAN technology or a bridge facility must be used.
In this paper we show how RoCL – a lightweight user-level communication library designed to support multi-threading in a multi-networked environment – manages to exploit such cluster organization. Performance evaluation results obtained by using two partitions of Myrinet and Gigabit SMP nodes demonstrate the usefulness of our approach both for low-level and high-level operation.
RoCL is a communication library that aims to exploit the low-level communication facilities of today’s cluster networking hardware and to merge, via the resource oriented paradigm, those facilities and the high-level degree of parallelism achieved on SMP systems through multi-threading.
The communication model defines three major entities – contexts, resources and buffers – which permit the design of high-level solutions. A low-level distributed directory is used to support resource registering and discovering.
The usefulness and applicability of RoCL is briefly addressed through a basic modelling example – the implementation of TPVM over RoCL. Performance results for Myrinet and Gigabit Ethernet, currently supported in RoCL through GM and MVIA, respectively, are also presented.
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
Multi-threading is a popular choice for server architecture. Widely used servers, like the Apache web server and the MySQL database server, are written in a multi-threaded fashion. We investigate the effects of thread architecture on server performance from two angles: (1) number of user threads per kernel thread, and (2) use of blocking I/O vs. non-blocking I/O. We propose N-to-M threads with non-blocking I/O, a novel threading model, to provide higher performance for servers, and explain its advantages over other existing thread architectures, viz., 1-to-1 threads with blocking I/O, N-to-1 threads with non-blocking I/O, and N-to-M threads with blocking I/O. We demonstrate the efficacy of this threading model by showing performance improvement for Apache and MySQL. Results show that our threading model provides a performance improvement of 10--22% for Apache (for synthetic and real workloads), and 10--17% for MySQL (for TPC-W workload) over existing thread models.
Emerging processor technologies are becoming commercially available that make multi-processor capabilities affordable for use in a large number of computer systems. Increasing power consumption by this next generation of processors is a growing concern as the cost of operating such systems continues to increase.
It is important to understand the characteristics of these emerging technologies in order to enhance their performance. By understanding the characteristics of high performance computing workloads on real systems, the overall efficiency with which such workloads are executed can be increased. In addition, it is important to determine the best trade-off between system performance and power consumption using the variety of system configurations that are possible with these new technologies.
This thesis seeks to provide a comprehensive presentation of the performance characteristics of several real commercially available simultaneous-multithreading multi-processor architectures and provide recommendations to improve overall system performance. As well, it will provide solutions to reduce the power consumption of such systems while minimizing the performance impact of these techniques on the system.
The results of the research conducted show that the new scheduler proposed in this thesis is capable of providing significant increases in efficiency for traditional and emerging multi-processor technologies. These findings are confirmed using real system performance and power measurements.; Thesis (Master...
Tese de mestrado, Engenharia Informática (Interacção e Conhecimento), Universidade de Lisboa, Faculdade de Ciências, 2014; A projecção perspectiva linear tem sido o modo predominante de desenhar imagens de espaços tridimensionais há séculos, sejam estes desenhos manuais ou computacionais. Em particular, os arquitectos usam o sistema perspéctico linear no desenho formal como forma de representar o seu trabalho como este seria observado através da visão humana. Porém, este sistema perspéctico é limitado nesta capacidade. Quando se utilizam ângulos de visão mais alargados, os desenhos que utilizam projecção perspectiva linear manifestam uma distorção que dificulta a interpretação da imagem e limitam a capacidade desta perspectiva de verdadeiramente representar a visão humana. Aos 180º de ângulo de visão, o máximo possível neste sistema, a imagem passa a ser inteiramente irreconhecível. Formas de projecção perspectiva não lineares, ou curvilíneas, como a perspectiva cilíndrica ou panorâmica ou a perspectiva esférica, não apresentam a mesma limitação. Usando estas perspectivas, o ângulo de visão pode ser alargado até aos 360º sem que esta distorção apareça. Porém, estes sistemas também não são soluções perfeitas já que a rectilinearidade não é preservada. O Sistema Perspéctico Expandido (EPS) foi criado por membros da equipa NAADIR (New Approach on Architectural Drawings Integrating computeR descriptions) para responder às limitações dos sistemas individuais existentes. O seu trabalho...
Threads as considered in basic thread algebra are primarily looked upon as
behaviours exhibited by sequential programs on execution. It is a fact of life
that sequential programs are often fragmented. Consequently, fragmented program
behaviours are frequently found. In this paper, we consider this phenomenon. We
extend basic thread algebra with the barest mechanism for sequencing of threads
that are taken for fragments. This mechanism, called poly-threading, supports
both autonomous and non-autonomous thread selection in sequencing. We relate
the resulting theory to the algebraic theory of processes known as ACP and use
it to describe analytic execution architectures suited for fragmented programs.
We also consider the case where the steps of fragmented program behaviours are
interleaved in the ways of non-distributed and distributed multi-threading.; Comment: 24 pages, sections 9, 10, and 11 are added
A new high-level interface to multi-threading in Prolog, implemented in
hProlog, is described. Modern CPUs often contain multiple cores and through
high-level multi-threading a programmer can leverage this power without having
to worry about low-level details. Two common types of high-level explicit
parallelism are discussed: independent and-parallelism and competitive
or-parallelism. A new type of explicit parallelism, pipeline parallelism, is
proposed. This new type can be used in certain cases where independent
and-parallelism and competitive or-parallelism cannot be used.; Comment: Online Proceedings of the 11th International Colloquium on
Implementation of Constraint LOgic Programming Systems (CICLOPS 2011),
Lexington, KY, U.S.A., July 10, 2011
Speculative multi-threading (SpMT) has been proposed as a perspective method
to exploit Chip Multiprocessors (CMP) hardware potential. It is a thread level
speculation (TLS) model mainly depending on software and hardware co-design.
This paper researches speculative thread-level parallelism of general purpose
programs and a speculative multi-threading execution model called Prophet is
presented. The architectural support for Prophet execution model is designed
based on CMP. In Prophet the inter-thread data dependency are predicted by
pre-computation slice (p-slice) to reduce RAW violation. Prophet
multi-versioning Cache system along with thread state control mechanism in
architectural support are utilized for buffering the speculative data, and a
snooping bus based cache coherence protocol is used to detect data dependence
violation. The simulation-based evaluation shows that the Prophet system could
achieve significant speedup for general-purpose programs.; Comment: 9 pages
Multi-threading allows agents to pursue a heterogeneous collection of tasks
in an orderly manner. The view of multi-threading that emerges from thread
algebra is applied to the case where a single agent, who may be human,
maintains a hierarchical multithread as an architecture of its own activities.
We present the new multi-threaded version of the state-of-the-art answer set
solver clasp. We detail its component and communication architecture and
illustrate how they support the principal functionalities of clasp. Also, we
provide some insights into the data representation used for different
constraint types handled by clasp. All this is accompanied by an extensive
experimental analysis of the major features related to multi-threading in
clasp.; Comment: 19 pages, 5 figures, to appear in Theory and Practice of Logic
Multi-threading is currently supported by several well-known Prolog systems
providing a highly portable solution for applications that can benefit from
concurrency. When multi-threading is combined with tabling, we can exploit the
power of higher procedural control and declarative semantics. However, despite
the availability of both threads and tabling in some Prolog systems, the
implementation of these two features implies complex ties to each other and to
the underlying engine. Until now, XSB was the only Prolog system combining
multi-threading with tabling. In XSB, tables may be either private or shared
between threads. While thread-private tables are easier to implement, shared
tables have all the associated issues of locking, synchronization and potential
deadlocks. In this paper, we propose an alternative view to XSB's approach. In
our proposal, each thread views its tables as private but, at the engine level,
we use a common table space where tables are shared among all threads. We
present three designs for our common table space approach: No-Sharing (NS)
(similar to XSB's private tables), Subgoal-Sharing (SS) and Full-Sharing (FS).
The primary goal of this work was to reduce the memory usage for the table
space but, our experimental results...
This paper presents the multi-threading and internet message communication
capabilities of Qu-Prolog. Message addresses are symbolic and the
communications package provides high-level support that completely hides
details of IP addresses and port numbers as well as the underlying TCP/IP
transport layer. The combination of the multi-threads and the high level
inter-thread message communications provide simple, powerful support for
implementing internet distributed intelligent applications.; Comment: Appeared in Theory and Practice of Logic Programming, vol. 1, no. 3,
This paper presents two conceptually simple methods for parallelizing a
Parallel Tempering Monte Carlo simulation in a distributed volunteer computing
context, where computers belonging to the general public are used. The first
method uses conventional multi-threading. The second method uses CUDA, a
graphics card computing system. Parallel Tempering is described, and challenges
such as parallel random number generation and mapping of Monte Carlo chains to
different threads are explained. While conventional multi-threading on CPUs is
well-established, GPGPU programming techniques and technologies are still
developing and present several challenges, such as the effective use of a
relatively large number of threads. Having multiple chains in Parallel
Tempering allows parallelization in a manner that is similar to the serial
algorithm. Volunteer computing introduces important constraints to high
performance computing, and we show that both versions of the application are
able to adapt themselves to the varying and unpredictable computing resources
of volunteers' computers, while leaving the machines responsive enough to use.
We present experiments to show the scalable performance of these two
approaches, and indicate that the efficiency of the methods increases with
bigger problem sizes.; Comment: 15 pages...
This paper introduces a simple method for producing multichannel MIDI music
that is based on randomness and simple probabilities. One distinctive feature
of the method is that it produces and sends in parallel to the sound card more
than one unsynchronized channels by exploiting the multi-threading capabilities
of general purpose programming languages. As consequence the derived sound
offers a quite ``full" and ``unpredictable" acoustic experience to the
listener. Subsequently the paper reports the results of an evaluation with
users. The results were very surprising: the majority of users responded that
they could tolerate this music in various occasions.; Comment: 7 pages, 5 figures
The inability to predict lasting languages and architectures led us to
develop OCCA, a C++ library focused on host-device interaction. Using run-time
compilation and macro expansions, the result is a novel single kernel language
that expands to multiple threading languages. Currently, OCCA supports device
kernel expansions for the OpenMP, OpenCL, and CUDA platforms. Computational
results using finite difference, spectral element and discontinuous Galerkin
methods show OCCA delivers portable high performance in different architectures
and platforms.; Comment: 25 pages, 6 figures, 9 code listings, 8 tables, Submitted to the SIAM
Journal on Scientific Computing (SISC), presented at the Oil & Gas Workshop
2014 at Rice University
In this work, we present an automatic way to parallelize logic programs for
finding all the answers to queries using a transformation to low level
threading primitives. Although much work has been done in parallelization of
logic programming more than a decade ago (e.g., Aurora, Muse, YapOR), the
current state of parallelizing logic programs is still very poor. This work
presents a way for parallelism of tabled logic programs in XSB Prolog under the
well founded semantics. An important contribution of this work relies in
merging answer-tables from multiple children threads without incurring copying
or full-sharing and synchronization of data-structures. The implementation of
the parent-children shared answer-tables surpasses in efficiency all the other
data-structures currently implemented for completion of answers in
parallelization using multi-threading. The transformation and its lower-level
answer merging predicates were implemented as an extension to the XSB system.
Los procesadores multi-core y el multi-threading por hardware permiten aumentar el rendimiento de las aplicaciones. Por un lado, los procesadores multi-core combinan 2 o más procesadores en un mismo chip. Por otro lado, el multi-threading por hardware es una técnica que incrementa la utilización de los recursos del procesador. Este trabajo presenta un análisis de rendimiento de los resultados obtenidos en dos aplicaciones, multiplicación de matrices densas y transformada rápida de Fourier. Ambas aplicaciones se han ejecutado en arquitecturas multi-core que explotan el paralelismo a nivel de thread pero con un modelo de multi-threading diferente. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multi-core y multi-threading en el rendimiento.; Els processadors multi-core y el multi-threading per hardware permeten augmentar el rendiment de les aplicacions. Per un costat, els processadors multi-core combinen 2 o més processadors en un mateix xip. Per l'altre costat, el multi-threading per hardware és una tècnica que incrementa la utilització dels recursos del processador. Aquest treball presenta una anàlisi de rendiment dels resultats obtinguts en dues aplicacions, la multiplicació de matrius denses i la transformada ràpida de Fourier. Les dues aplicacions s'han executat en arquitectures multi-core que exploten el paral·lelisme a nivell de thread però amb un model diferent de multi-threading. Els resultats obtinguts mostren la importància d'entendre i saber analitzar l'efecte de multi-core i multi-threading en el rendiment.; Multi-core processors and hardware multi-threading increases the application performance. On one hand...
Embora a computação paralela já tenha sido alvo de inúmeros estudos, o processo de a tornar acessível as massas ainda mal começou. Através da combinação com o Prolog de um ambiente de programação distribuída e multithreaded, como o PM2, torna-se possível ter computações paralelas e concorrentes usando programação em logica. Com este objetivo foi desenvolvido o PM2-Prolog, um interface Prolog para o sistema PM2. Tal sistema permite correr aplicações Prolog multithreaded em múltiplas instâncias do GNU Prolog num ambiente distribuído, tirando, assim, partido dos recursos disponíveis nos computadores ligados numa rede. Em problemas computacionalmente pesados, onde o tempo de execução é crucial, existe particular vantagem em usar este sistema. A API do sistema oferece primitivas para gestão de threads e para comunicação explícita entre threads. Testes preliminares mostram um ganho de desempenho quase linear, em comparação com uma versão sequencial. /ABSTRACT - Although parallel computing has been widely researched, the process of bringing concurrency and parallel programming to the mainstream has just begun. Combining a distributed multi-threading environment like PM2 with Prolog, opens the way to exploit concurrency and parallel computing using logic programming. To achieve such a purpose...