Friday, December 31, 2010

The History Of The SPARC processor

In the history of evolution of the processors, SPARC holds a special place. It was the flagship of Sun Microsystems, then king of the performance servers. Somewhere during the journey it lost its place to the onslaught of the high performance and multicore brigade of Intel/AMD chips. IBM Power was/is a formidable opponent too. Oracle buying up Sun added to the uncertainties.

Besides the version changes and the improvements dealt in the article, what is interesting is that this was a shining example of the RISC cult that shook the world computing world quite a bit. RISC philosophy talks about 80% of the computation work being done by about 20% of the instructions of any computer system. So, the proponents advocated simplifying the ISA of processors. Just the raw available power available with the CISC processors are so much that this efficiency debate has gone away. That Sparc was an important mileston.er of the silicon pro.ecressor evolution history, that it held an important place for 20 odd years, is undeniable.

The History Of The SPARC processor « UNIX

Wednesday, December 29, 2010

Are 4 bit processors in use still?!

One would think the 4 bitters are dead by now! Not so! They survive and they do so in some interesting ways. Read the complete article below for details, Robert Carvotta has done a wonderful coverage in his article. But here are the highlights.


1. There are quite a few manufacturers still selling 4 bit devices. These are Atmel, EM Microelectronics, and Epson as well as NEC Electronics, Renesas, Samsung, and National. 
2. Some are continuing their lines to support legacy applications, that is not very surprising. Atmel and EM Microelectronic among them. EM chips appear to be present in timepiece designs largely.
3.These go into high volume products such as the Gillette Fusion proGlide.
4. EM Electonics sells them as ROM based devices, developers use them as another hard configured device, not programmable at all.
5. As many of these devices can work with 0.6 V power supply, with low duty cycle applications it can fit into applications where a single battery can serve for the lifetime of the appliance. Applications where the device is possibly sleeping 90% of the time.
6. What else could be responsible for letting these devices survive so long!!

Monday, December 27, 2010

iPad 2 likely to have dual core chip

Ashok Kumar an analyst at a full service investment bank Rodman & Renshaw foresees Apple upgrading iPad as well as iPhone to dual core processors. he expects this upgrade to be to a pair of 1GHz ARM Cortex A9 cores.  he sees these upgrades coming in Mar for the iPad  and by late summer for the iPhone. The timing may be in question but, the fact that a upgrade is required to compete with  devices like RIM playbook and the Motorola tablet is a given.

Analyst: iPad 2 to sport dual-core chip



Wednesday, December 22, 2010

Power Architecture and Energy Management

The latest version of the ISA specifications for the Power architecture, Power ISA v.2.06 published Feb last year, introduces mechanisms in the architecture that help power management in chips based on the spec. Besides decreasing the energy demand in general, processors to be used in embedded applications need power management such that things can be done at as low a power demand as possible. Overall architecture has to take the low consumption into account and layouts made accordingly, of course.


Most these sophisticated chips use dynamic circuitry that draw more power as the clock speed goes up. So, sections of hardware circuitry that are not needed during a particular use, can have clock to these sections choked off. This is a well known technique and is being introduced in processor chips for some time. This specification also introduces  "clock gating" as a tool to manage power. Overall the core frequency can be manipulated to manage consumption. For example, it can be pushed up when the processor has to handle heavy processing load, while it can be reduced in other situations. Software can dynamically increase or decrease the core’s clock frequency while the rest of the system keeps operating at an earlier value.


The Power ISA v.2.06 allows for power management on hypervisor and virtualization on single and multi-core processor implementations. A dynamic energy management lets parts of the core to operate and other parts not required to be power gated. For example execution units in the processor pipeline could be power-gated when idle. The architecture offers software-selectable power-saving modes. These modes may reduce the functionality in some areas, such as limiting cache and bus-snooping operations. In some operation scenarios you may turn off all functional units except for interrupts. This architecture now also enables execution of an instruction that can shut off the chip and let it wake up only on an external event.  read the following article for further details.

Energy Management in Power Architecture

Friday, December 10, 2010

Oracle to halve core count in next Sparc processor

Would you believe this! Oracle, who owns Sun now wants a less powerful, less number of cores on their latest Sparc version. The Sparc T3 had graduated to 16 cores, but now they have announced that the next Sparc T4 will have only 8 cores! All you budding computer scientists, try and figure out why! Read the article. I may even do a post summarizing the reason!

Oracle to halve core count in next Sparc processor | Hardware - InfoWorld

Sunday, December 5, 2010

NEON Technology (Advanced SIMD)

NEON technology in nothing but an advance version of SIMD (single instruction multiple data ). 64-bit and 128-bit SIMD instructions are combined together to increase performance in standardized application of media and signal processing. NEON can execute MP3 audio decoding on CPUs running at 10 MHz and can run the GSM  AMR (Adaptive Multi-Rate) speech codec at no more than 13 MHz. Different sized data type and operators such as 8-, 16-, 32- and 64-bit integer and single-precision (32-bit) floating-point data are supported by NEON technology. Those data and operations are very efficient to handle audio and video processing as well as graphics and gaming processing. The NEON hardware shares the same floating-point registers as used in Vector Floating Point. In NEON, the SIMD supports up to 16 operations at the same time.




Contributed by: Amalendu Si, IT department

Tuesday, November 2, 2010

Android phones outsell iPhone 2-to-1, says research firm - Computerworld

Android is coming on strong looks like! Would someone take up a post on Android here! The main features, and if possible, what's making it so popular?

Android phones outsell iPhone 2-to-1, says research firm - Computerworld


Wednesday, October 13, 2010

The 37Th Annual Microprocessor Directory

EDN has been publishing these extensive directories over the years. They are a reputed source of information on Electronic technology. If you want to find out what's available in Microprocessors today, check this out. The first link is a link to the directory of manufacturers, the second link provides more details of devices available today.


The 37th annual microprocessor directory: a universe explored


Saturday, October 2, 2010

GPU Computing – From graphics operations to specialized high performance general purpose computing

A graphics object consists of a number of vertices, interrelationship among those vertices, color, lighting, and texture information of those vertices and how to render them onto display. So with advancement of complex graphics objects where billions of vertices have to be taken into account, normal processors are not efficient enough to perform graphics operation especially in real time constraint. So there need a special type of processor that will perform graphics operations efficiently. So Graphics Processing Unit came into existence.
Graphics Processing Unit:
A Graphics Processing Unit (GPU) is conventionally a specialized microprocessor for graphics operation. A GPU’s primary job is to accelerate the rendering of graphics objects onto a display.
In graphics operations, vertices manipulation are specified by vertex shader and attributes (Information about the vertices, i.e: color information, lighting, texture, and other associated data) manipulation are specified by pixel shader. A shader is basically a subroutine or program to be executed within the GPU which is used to transform the input data into the appropriate output data needed to calculate or define some aspect of the final image.

In GPU the calculations for one pixel is generally independent of every other pixel, the architecture of the GPU has evolved to process many hundreds of pixels simultaneously. In support, the GPU can be executing shaders and resolving data values from thousands of geometric objects concurrently.

In the past two decades, GPU has made enormous advancement in performance and capability. GPU evolution in the past two decades has consumed transistors at a very high rate, more than doubling for every one and half years. NVIDIA’s latest GPU chip in 40nm technology employs billions of transistors.

So massive parallelism makes its operation not limited to only graphics area. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart. The GPU’s rapid increase in both programmability and capability has created a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. It is being applied to some of the most complex parallel computing problems which were only addressable by supercomputers in the past. The use of high performance GPUs as supercomputers opens new opportunities for many applications that can use the GPU’s massively parallel computational capabilities and can benefit from its cost and availability. Examples related to areas such as seismic detection, oil exploration, medical imaging. These efforts in general purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future.

The GPU is designed for a particular class of applications with the following characteristics.

·         Computational requirements are large: Real-time rendering requires billions of pixels per second, and each pixel requires hundreds or more operations.

·         Parallelism is substantial: Fortunately, the graphics pipeline is well suited for parallelism.


·         Throughput is more important than latency: GPU implementations of the graphics pipeline prioritize throughput over latency. The human visual system operates on millisecond time scales, while operations within a modern processor take nanoseconds. This six-order-of-magnitude gap means that the latency of any individual operation is unimportant.

Modern application of GPU:

·         Differential equations: The earliest attempts to use GPUs for nongraphics computation are solving large sets of differential equations. GPUs have been heavily used to solve problems in partial differential equations (PDE).

·         Linear algebra: Sparse and dense linear algebra routines are the core building blocks for a huge class of numeric algorithms, including many PDE solvers mentioned above. Applications include simulation of physical effects such as fluids, heat, and radiation.

·         Search and database queries: Researchers have also implemented several forms of search on the GPU, such as binary search and nearest neighbor search as well as high-performance database operations that build on special-purpose graphics hardware.


Experience has shown that when algorithms and applications can follow design principles for GPU computing such as the PDE solvers, linear algebra packages, and database systems referenced above, they can achieve 10–100 times speedups over even mature, optimized CPU codes.

Can the GPUs outpace the CPUs?

Contributed by: Subhasis Koley, CSE, MIT

Friday, October 1, 2010

Quantum Computer

A quantum computer is any device for computation that makes direct use of distinctively quantum mechanical phenomena, such as superposition and entanglement, to perform operations on data. In a classical (or conventional) computer, the amount of data is measured by bits; in a quantum computer, it is measured by qubits. The basic principle of quantum computation is that the quantum properties of particles can be used to represent and structure data and that quantum mechanisms can be devised and built to perform operations with these data.
It is widely believed that if large-scale quantum computers can be built, they will be able to solve certain problems faster than any classical computer. Quantum computers are different from classical computers such as DNA computers and computers based on transistors, even though these may ultimately use some kind of quantum mechanical effect (for example covalent bonds). Some computing architectures such as optical computers may use classical superposition of electromagnetic waves, but without some specifically quantum mechanical resource such as entanglement, they do not share the potential for computational speed-up of quantum computers.

The Bloch sphere is a representation of a qubit, the fundamental building block of quantum computers.

(The Bloch sphere is a representation of a qubit, the fundamental building block of quantum computers.)




The basis of quantum computing:
In quantum mechanics, the state of a physical system (such as an electron or a photon) is described by a vector in a mathematical object called a Hilbert space. The realization of the Hilbert space depends on the particular system. For instance, in the case of a single particle system in three dimensions, the state can be described by a complex-valued function defined on R3 (three-dimensional space) called a wave function. As described in the article on quantum mechanics, this function has a probabilistic interpretation; of particular significance is that quantum states can be in a superposition of the basis states. The time evolution of the system state vector is assumed to be unitary, meaning that it is reversible.
A classical computer has a memory made up of bits, where each bit holds either a one or a zero. The device computes by manipulating those bits, i.e. by transporting these bits from memory to (possibly a suite of) logic gates and back. A quantum computer maintains a set of qubits. A qubit can hold a one, or a zero, or a superposition of these. A quantum computer operates by manipulating those qubits, i.e. by transporting these bits from memory to (possibly a suite of) quantum logic gates and back.
Qubits for a quantum computer can be implemented using particles with two spin states: "up" and "down" (typically written |0\rangleand |1\rangle) in fact, any system possessing an observable quantity A which is conserved under time evolution and such that A has at least two discrete and sufficiently spaced consecutive eigenvalues, is a suitable candidate for implementing a qubit, since any such system can be mapped onto an effective spin-1/2..
Bits vs. qubits:
Consider first a classical computer that operates on a 3 bit register. At any given time, the bits in the register are in a definite state, such as 101. In a quantum computer, however, the qubits can be in a superposition of all the classically allowed states. In fact, the register is described by a wave function:
|\psi \rangle = \alpha|000\rangle + \beta|001\rangle + \gamma|010\rangle + \ldots
where the coefficients α, β, γ,... are complex numbers whose amplitudes squared are the probabilities to measure the qubits in each state. Consequently, | γ | 2- is the probability to measure the register in the state 010. That these numbers are complex is important because the phases of the numbers can constructively and destructively interfere with one another, an important feature for quantum algorithms.
 For an n qubit quantum register, recording the state of the register requires 2n complex numbers (the 3-qubit register requires 23 = 8 numbers). Consequently, the number of classical states encoded in a quantum register grows exponentially with the number of qubits. For n=300, this is roughly 1090, more states than there are atoms in the known universe. Note that the coefficients are not all independent, since the probabilities must sum to 1. The representation is also (for most practical cases) non-unique, since there is no way to physically distinguish between a particular quantum register and a similar one where all of the amplitudes have been multiplied by the same phase such as −1, i, or in general any number on the complex unit circle. One can show the dimension of the set of states of an n qubit register is 2n+1 − 2.
Initialization, execution and termination:
A qubit registers can be thought of as an 8-dimensional complex vector. An algorithm for a quantum computer must initialize this vector in some specified form (dependent on the design of the quantum computer). In each step of the algorithm, that vector is modified by multiplying it by a unitary matrix. The matrix is determined by the physics of the device. The unitary character of the matrix ensures the matrix is invertible
Upon termination of the algorithm, the 8-dimensional complex vector stored in the register must be somehow read off from the qubit register by a quantum measurement. However, by the laws of quantum mechanics, that measurement will yield a random 3 bit string (and it will destroy the stored state as well). This random string can be used in computing the value of a function because (by design) the probability distribution of the measured output bitstring is skewed in favor of the correct value of the function. By repeated runs of the quantum computer and measurement of the output, the correct value can be determined, to a high probability, by majority polling of the outputs. See quantum circuit for a more precise formulation. In brief, quantum computations are probabilistic.
A quantum algorithm is implemented by an appropriate sequence of unitary operations. Note that for a given algorithm, the operations will always be done in exactly the same order. There is no "IF THEN" statement to vary the order, since there is no way to read the state of a qubit before the final measurement. There are, however, conditional gate operations such as the controlled NOT gate, or CNOT.
Quantum computing in computational complexity theory:
This section surveys what is currently known mathematically about the power of quantum computers. It describes the known results from computational complexity theory and the theory of computation dealing with quantum computers.
The class of problems that can be efficiently solved by quantum computers is called BQP, for "bounded error, quantum, polynomial time". Quantum computers only run randomized algorithms, so BQP on quantum computers is the counterpart of BPP on classical computers. It is defined as the set of problems solvable with a polynomial-time algorithm, whose probability of error is bounded away from one quarter (Nielsen & Chuang 2000). A quantum computer is said to "solve" a problem if, for every instance, its answer will be right with high probability. If that solution runs in polynomial time, then that problem is in BQP.
BQP is suspected to be disjoint from NP-complete and a strict superset of P, but that is not known. Both integer factorization and discrete log are in BQP. Both of these problems are NP problems suspected to be outside BPP, and hence outside P. Both are suspected to not be NP-complete. There is a common misconception that quantum computers can solve NP-complete problems in polynomial time. That is not known to be true, and is generally suspected to be false.
An operator for a quantum computer can be thought of as changing a vector by multiplying it with a particular matrix. Multiplication by a matrix is a linear operation. It has been shown that if a quantum computer could be designed with nonlinear operators, then it could solve NP-complete problems in polynomial time. It could even do so for #P-complete problems. It is not yet known whether such a machine is possible.
Although quantum computers are sometimes faster than classical computers, ones of the types described above can't solve any problems that classical computers can't solve, given enough time and memory (albeit possibly an amount that could never practically be brought to bear). A Turing machine can simulate these quantum computers, so such a quantum computer could never solve an undecidable problem like the halting problem. The existence of "standard" quantum computers does not disprove the Church-Turing thesis (Nielsen and Chuang 2000).
Very recently, some researchers have begun to investigate the possibility of using quantum mechanics for hypercomputation - that is, solving undecidable problems. Such claims have been met with very considerable skepticism as to whether it is even theoretically possible; see the hypercomputation article for more details.
.References:
  • David P. DiVincenzo (2000). "The Physical Implementation of Quantum Computation". Experimental Proposals for Quantum Computation. arXiv:quant-ph/0002077.
  • D.P. DiVincenzo (1995). "Quantum Computation". Science 270 (5234): 255–261. Table 1 lists switching and dephasing times for various systems.
  • Richard Feynman (1982). "Simulating physics with computers". International Journal of Theoretical Physics 21: 467.
  • Gregg Jaeger (2006). Quantum Information: An Overview. Berlin: Springer. ISBN 0-387-35725-4."
  • Michael Nielsen and Isaac Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press. ISBN 0-521-63503-9.  


 Contributed By:  Abhishek Mandal,  Dept. of Computer Science & Engineering,                                                                                      Mallabhum Institute of Technology

Thursday, September 30, 2010

Tasks Scheduling Algorithm for Multiple Processors with Dynamic Reassignment

Distributed computing systems [DCSs] offer the potential for improved performance and resource sharing. To make the best use of the computational power available, it is essential to assign the tasks dynamically to that processor whose characteristics are most appropriate for the execution of the tasks in distributed processing system. We have developed a mathematical model for allocating “M” tasks of distributed program to “N” multiple processors (M>N) that minimizes the total cost of the program. Relocating the tasks from one processor to another at certain points during the course of execution of the program that contributes to the total cost of the running program has been taken into account. Phasewise execution cost [EC], intertask communication cost [ITCT], residence cost [RC] of each task on different processors, and relocation cost [REC] for each task have been considered while preparing a dynamic tasks allocation model. The present model is suitable for arbitrary number of phases and processors with random program structure.

Consider a distributed program consisting of a set T={t1,t2,t3,..,tMof M tasks to be allocated to a set P={p1,p2,p3,…,pN} of N processors divided into K phases. The basis for dynamic program model is the concept of the phases of a task program. With each phase the following information is associated:
(1)The executing task during this phase and its execution cost on each processor in a heterogeneous system.

(2)Residence costs of the remaining tasks, except for the executing task, on each processor. These costs may come from the use of storages.

(3)Intertask communication cost between the executing task and all other tasks if they are on different processors.

(4)An amount of relocation cost for reassigning each task from one processor to the others at the end of the phases.

In general, the objective of task assignment is to minimize the completion cost of a distributed program by properly mapping the tasks to the processors. The cost of an assignment A, TCOST(A), is the sum of execution, intertask communication, residence, and relocation costs.


   Contributed By:    ARUP ROY,  LECTURER (CSE)  , MALLABHUM INSTITUTE OF TECHNOLOGY.            

Saturday, September 18, 2010

Microprocessors Operating at below 1 volt!!

TI's MSP 430 family, we already know, is one of the lowest power consuming micro family. Now comes news of a new device that takes this to new lows! MSP430L092 operates at below 1 volt, at 0.9 v precisely. What's more remarkable is that all the analog and digital parts work at this voltage, there are no voltage boosting circuitry on chip.


The new member can work at 0.9 to 1.65 v range. The core is a 16 bit RISC working at 4 MHz and the power consumption is just 45 microA/MHz!!!! 2KB RAM and 2 KB ROM make up the memory system.  There are several timer peripherals, such as the 32-bit Watchdog timer and two 16-bit general purpose timers. This device has  11 I/Os that are also interrupt capable. There's a  Analog Pool IP has several new features.The Analog Pool, or APOOL has basic blocks such as a 256mV voltage reference, a Comparator, and an 8-bit DAC. 


Like other members of the family this too is well suited to create a highly portable instrument that lasts and lasts on batteries.

Thursday, September 16, 2010

Concept of Image Processing Based Architecture

Now a days microcomputers to general purpose large computers are used in image processing. Dedicated image processing systems connected to host computers are very popular. Special coprocessor card and parallel processor are also being included in many small systems to gain speed. Interactive graphic devices are also added to provide image editing facilities. Digitized image arrays are in most cases very large. So, sufficiently large core memory should be provided with the system. These working systems should have adequate and efficient secondary large storage facility. Here, magnetic tapes and disks are the most popularly used storage media. Image processing programs are often coded in assembly language for fast execution, the flexibility of the system can be improved by having high-level languages for use in the development phase. Depending on the requirement various image processing architectures are designed and are available in the market. For example machines for scientific research are different from commercial ones, as to solve a particular problem may need a special architecture. On the other hand in industrial application, a machine needs to do particular job in real time. Accordingly four major distinctions of it are made:
1         Scientific research and commercial machines
2         Real time and off-line machine
3         Machines for imaging and machine vision job
4         Machine for process control and inspection
Most of the image processing hardware is based on one of the following architectural concepts:
i)                    Serial or von Neumann architecture : It is a low- cost traditional serial processor based on a microprocessor chip with a complex instruction set (CISC) or reduced instruction set (RISC)
ii)                   Multiple – instruction multiple-device microprocessors: It is small array of RISC or CISC elements and is characterized by interconnections among processors as well as interconnections between processors and memory element.
iii)                 Pipelines: It is also an MIMD architecture where identical processors are connected in a sequence. Algorithm is decomposed and mapped to this sequence such that each processor executes only sub-task in order. So image data go in at one end at frame rate, passes from one programmable module to the next, and finally resultant image comes out of another end at the same rate.
iv)                 Single instruction multiple device (SMID) parallel processors: It usually operates on image bit-planes in parallel. In that case, it is called single-bit SIMD. However, it may be designed to operate on the whole byte or a complete word also. 

Contributed by: Biswajit HalBiswajit Halder

Thursday, September 9, 2010

Jolicloud -Lightweight OS for Netbooks

             
               Netbooks are not like a fully powered Notebook and thus working with Windows on Netbooks could be irritating at times. There is a very light weight operating system designed specifically for the netbooks, called Jolicloud. We all know that Windows Vista didn't work well even on fulls-cale notebooks. One of the reasons most of the Netbooks comes with either Windows XP or some platform of Linux. It is quite slow for the low-powered netbooks alright.

Jolicloud may remind us of the smartphones because, it is application-centric. It comes with many applications already installed, like Facebook, Google Reader (must for bloggers), Times Skimmer, Google Docs, Gmail and even DropBox etc. If we want, we can also add any supported application in the blink of an eye.

 The architecture of Jolicloud consists of mainly three components. One is the kernel, next is the user interface which is quite an impressive one and finally the Jolicloud homebase. This homebase is actually free from the netbook and it helps one in linking up the notebooks together. It means, any update on your one netbook will automatically be made available in other linked netbooks.Jolicloud is based on GNU/Linux, Kernel.org, Ubuntu and Debian that has been optimized and extended for netbooks.

GMA 500 support was not optimal in Linux OS. The driver is developed by Tungsten Graphics, not by Intel, and the graphic core is not an Intel one, but is licensed from PowerVR. This led to an uncertain mix of open and closed source 3D accelerated drivers, instability and lack of support. Ubuntu is the Linux distribution that best supports GMA 500, through the use of the ubuntu-mobile. However, the installation procedure is not as simple as other drivers and can lead to many bugs. The Jolicloud has a driver for the GMA500 built in.PixieLive, a GNU/Linux live distribution optimized for GMA500 netbooks can boot from USB Pendrive, SD Card or HardDisk.

What Jolicloud did was that, it integrated the Ubuntu version 8.04 with the Jolicloud kernel and then integrated the Poulsbo DRI with the native kernel DRI alongwith inputting some libraries and packages. Finally Jolicloud was the standard distribution ISO that can support GMA 500 and also the DRI and DRM.
Contributed By : Swagata Nath, Class of 2010, CSE, MIT