The channel theory


I am waiting to put the hands on my new Oculus Rift ( I also have the first release dk1 ) and I was thinking on how to know if a new device will be a success .

There is a parameter normally underestimate or absolutely not taken into account in the new devices.

It is the size of the communication channel from the user and the device . This is one of the most important and perhaps the most important index to estimate how much a technology will be an advance .

For the Oculus Rift and in general a virtual reality device it is a great increment in the communication channel . The user get an image for each eye and every image can saturate the input channel of each eye, it is possible to create a device with an image definition such that an eye is not able detect more details and also cover completely the field of view of each eye . It is impossible for a monitor to reach the same saturation even increasing ten or hundred times the resolution because the user is not in a fixed position and probably the 50% or more of the information output from a monitor are lost .

That is only an analysis of the 2d but a virtual reality device output stereoscopic images giving a depth information so it give a 3d component .

The last component is the movement , every head movement, rotation give a different image that means another great increase of the information the user receive .

So the increase of the information given to the user from a V.R. device is so big compared to the monitors that every other side-effect is secondary .

This means that this device will not be only a nerdy device , it will be deployed in the future to a very wide range of users .

By this analysis what I suggest is not to put too much effort to solve sickness side-effect problems it is only a problem for people not enough trained to use the device ( something like to give a mouse to someone for the first time ) what really matter is the size of the channel so the direction of the development should be to increase the definition, increase the resolution!


Parallelizing again

This is the last release I compiled of The Cellular Automata 1D evolver on GPU by CUDA


This is more than 1 year old , optimized for GPU with about 500 cores . The new gpus by nvidia have “only” about 2500 “only” a factor of 5 but for the same price is possible to buy 4000~5000 cores from the amd gpus like 7xxx and soon the 8xxx series . There is also an advancing of the OpenCL supported by AMD , NVIDIA , INTEL ,  I am not sure if it is possible to reach the same computational power by OpenCL and I am sure there are problems on how the different brands implement the language so it is not so easy to write OpenCL programs working for different gpu of the same brand and for different brand but OpenCL let me work with different hardware solutions and perhaps I can reach a factor of 10x using OpenCL ( my main doubt is if it is possible to implement synchronization tricks I use on CUDA  tricks that let me gain a 5x factor of speedup! ).

The first amd card I bought is the gigabyte 7970


And with its 2048 overclock-able cores can reach about 6~7  times the computational work of my good geforce 480 ( it is a very good card , it worked nights and days for years without hesitations ).

Ok 6x time faster is not enough  for me , not enough to reimplement everything so my plan is to buy another one, a 7970 or 8970 when available and to work with a minimum of 2048+2560 ( or ~2300 )  and an increasing speed of about 10x . This configuration of multi-gpu give me the opportunity to implement another level of parallelization.

The current implementation of the evolver is a multicore level where the memory of the gpu is shared ( there are different levels of shared memory) for all the cores this feature let to each thread to compute value reading the result of other threads (every cell is the result of previous 3 cells ) .  The management of computational resource without shared memory let to expand the system to many levels of parallelization.


The above image show different triangles each one representing a computational job where the information shared by the computational job is the perimeter of the triangle . The red triangles to be computed need the base of the yellow triangle and the computation proceed reducing its size so there is no other information required. The base of the red triangles can be computed by a yellow triangle which need the information of the 2 side computed by 2 red triangles so we have a dependence where each red triangle need one yellow triangle which need 2 red triangles.

The size of the triangles will depend on the power of the computational units and the power of the transmission channels . It is also possible to recursively split a triangle into sub-triangles and this can be useful if there are different levels of computational units ( multi-gpu , multi-pc , computing grid ).

Given a triangle with a base size of B its perimeter is B*2 and this is the size of the communication in/out for this triangle . The number of cells computed in the triangle is (B/2)^2 .

So given C cells computation over 1 cell communication the size of the triangle by its base B should be B=8*C . This size let you to have no idle time due to synchronization.