Augmented Reality Sixth Sense Based Implementation Computer Science Essay

Some Deaf people can be made to hear by the usage of hearing AIDSs! All Blind people can be made to see either by utilizing rectification mechanisms or by surgery! But what about deaf people? Are they cursed? Well Obviously non. More or less it ‘s truly possible to do them talk with the aid of our execution theoretical account which relies upon the “ Augmented World ” and “ Sixth Sense ” Technologies. In this paper we propose our ain research theoretical account for doing the dense people to talk by merely utilizing their manus gestures similar to the 6th sense device. These manus gestures are used to acknowledge what they are seeking to convey to the existent universe by utilizing a camera as a digital oculus and the technique behind this is a complex digital image processing Algorithm, which differentiates assorted manus gestures and thereby selects the appropriate text templet for conveying the message, with the aid of loud talker utilizing text to speech transition algorithms thereby doing the dumb to pass on unnaturally.

Index Footings: Augmented Reality, Sixth Sense, Image processing, gesture recognization, Artificial Speaking Aid.

What is AR?

Augmented world ( AR ) is a field of computing machine scientific discipline that involves uniting the physical existent universe with an synergistic, 3D practical universe. This AR engineering blurs the line between what ‘s existent and what ‘s computer-generated by heightening what we see, hear, feel and odor. The end of augmented world is to add information and significance to a existent object or topographic point. The Augmented Reality therefore enhances the user perceptual experience by adding some extra information to the existent universe and therefore helps the world in carry throughing some important existent universe undertakings. Unlike practical world, augmented world does non make a simulation of world. Alternatively, it takes a existent object or infinite as the foundation and incorporates engineerings that add contextual informations to intensify a individual ‘s apprehension of the topic.

Sixth Sense Using AR

Unlike other complex systems that help in engineering universe, this 6th sense based systems use merely some minimum constituents that has really low-cost costs, compared to the other dearly-won devices, as follows.


Small projector & A ; Mirror


Some Colored Markers

These constituents are normally tied together in a defined manner to do a pendent like have oning device which helps in augmenting the existent universe with the practical universe by enabling us to interact straight with the digital universe utilizing merely the manus gestures. It is besides noteworthy because the projector basically turns any surface into an synergistic screen. Basically, the device plants by utilizing the camera and mirror to analyze the outside universe, feeds that image to the phone ( which processes the image, gathers GPS coordinates information and retrieves informations from the Internet ) , and so projecting information from the projector onto the surface in forepart of the user, whether it ‘s a carpus, a wall, or even a individual as an end product of the processing work. This device costs merely around US $ 350 harmonizing to the appraisal of the constituents cost. But the logic and algorithm behind this is really boring and complex. Though the algorithms are really complex, which form the footing for the processing of the device, the end product of this algorithm is really effectual in work outing the existent universe jobs around the people.

Our Proposed Model

3.1. Components and Architecture

In our proposed theoretical account, the constituents are more or less similar to that of the 6th sense device merely with some changes in order to accommodate our coveted application. The major constituents of our system includes,

Hardware Components

Portable Computing Device

Mirror & A ; Projector


Colored Markers


Memory Bit

3.1.2. Software Components

Motion Detection Algorithm

Collection of Dialogue Templates

Text to Speech Conversion Algorithm.

Input signals

This unreal speech production assistance has to be given some inputs to do the system available for usage. The input to the system in bend is merely the input to the constituents of the system. The camera detects the natural manus gestures of the user and predicts their several places, basic signifiers and give the ensuing image to the computer science device for bring forthing events.

The input to the text to speech transition algorithm is the set of text templet selected by the computer science device. The sound signals generated from the address templets are given as inputs to the speaker unit for talking to the existent universe.

The chief input to the projector is from the calculating device ( Smart phone itself ) , which undertakings the user interface of the system ( Speaking assistance ) to the user for commanding options which can be projected to any surface to utilize them as touch screen.

Hand Gesture Recognization

The Algorithm behind this is boring in instance of the 6th sense device, But here to implement our ain proposed theoretical account, We merely used a simple algorithm to observe some simple manus gestures for the interest of presentation. Here the manus motions are detected by the camera and some consecutive snap shootings are taken when the manus with five fingers with the pollex at the left, so the predefined set of action hearing images are compared with the taken snapshots one by one, if any one of the gesture makes the lucifer, so the matching address templet assigned to it should be selected by the computer science device and given to the following constituent.

But here the chief thing to be noted is the database which contains the gesture images and the address templates associated with it. Besides that if a corresponding gesture is made, it should raise the computer science device to give input to the projector to project the user interface screen of the system for talking assistance option control intent. Once if an manus gesture does non fit with the 1 that is stored in the database, /then it may be a job. But the chief of import thing is that the user should be trained to utilize the system by supplying some direction.

For illustration, see the below manus gesture acknowledgment of one simple manus ( right manus )

Here the manus is merely recognized by analysing the negative position of the camera, i.e. the lighter parts of the image are marked and so compared utilizing statistical analysis of spot places and color combinations of the igniter spots after some threshold values. Then the recognized image is checked with the images stored on the database of image gestures.

Here the gestures themselves will move as separate events in mapping the address templates with the gesture, as in the instance of event driven programming where the events are provided with some ID ‘s and so here the gestures are recognized by their distinguishable Idaho ‘s.

Therefore it helps in choosing the address templet to be conveyed to the existent universe.

Rather than hive awaying some predefined images of manus gestures, the system provides the advantage of choosing the ain manus gesture and fiting the address templet with the gesture is up to the ability of the user. This in some manner will assist the user to easy retrieve the actions and gestures, as they themselves used them. And besides helps the system in work outing the contentions between the images of gestures.

Image Processing Algorithm used: Blob Colouring Algorithm


Easy and fast sensing within acceptable frame rates ( ~2 fps+ ) .

Ease-up the Detection

To do the sensing more easy and fast, it was decided to concentrate entirely on the color/brightness of a certain object ( alternatively of observing the user ‘s existent finger ) .


Find parts of a pre-defined color/brightness within an image.


A aˆzbackwards L ” shaped templet is passed over the whole image from left to right and top to bottom.

backwards L ” shaped Template

For each pel calculate the distance…

d1 between itself and its left neighbor.

d2 between itself and its upper neighbor.

Definition: “ Distance of two pels ”

Grayscale: difference between the grey degrees.

RGB: Euclidian distance ERGB in the RGB colour infinite.

HSI: difference between chromaticity or strength.

Result Interpretation

A pel is considered to belong to a different part if the distance di between the next pel is greater than a certain threshold T.

( d1 & gt ; T ) and ( d2 & gt ; T )

Pixel is different from both neighbors = & gt ; assign to a new part.

( d1 & lt ; T ) and ( d2 & gt ; T )

Pixel is different from above neighbor, but similar to go forth neighbour = & gt ; assign to same part as left neighbor.

( d1 & gt ; T ) and ( d2 & lt ; T )

Pixel is different from left neighbor, but similar to above neighbor = & gt ; assign to same part as above neighbor.

( d1 & lt ; T ) and ( d2 & lt ; T )

Pixel is similar to both neighbors = & gt ; delegate it to the same part as the neighbors.


Case 4 is debatable:

Current pel is similar to both neighbors, but the parts for the neighbors differ.

Both neighbour parts differ but are tantamount due to the current pel

linking both.

Presently examined templet = & gt ; aˆzred line ”

Current Pixel = & gt ; aˆzred ”

Pixels in part aˆz1 ” = & gt ; aˆzgreen ”

Pixels in part aˆz2 ” = & gt ; aˆzblue ” .

Equivalent Region Problem


A 2D-integer-array is used to hive away the part figure for each pel. If the 4th instance occurs, renumbering the whole whole number array ( due to two parts being tantamount ) , is really clip devouring. Particularly because this can go on more than one time. Pre-processing: All pels non belonging to the defined color/brightness scope are removed from the image ( = & gt ; colour set to aˆzblack ” ) .Only colourss which we are seeking for are now present. Therefore the existent distance does non necessitate to be calculated anymore.= & gt ; It is either aˆz0 ” ( color- & gt ; colour ) or aˆz1 ” ( color- & gt ; black ) .


An equality map is used alternatively of the integer-array.


Region equality Map

Problem: In instance it is discovered that part aˆz2 ” is non merely tantamount to aˆz1 ” but besides toaˆz3 ” , this information would be overwritten. If renumbering is taking topographic point instantly the processing clip is raising once more.

Region equality Trees Region equality Table before and after flattening

Determining which Blob to utilize

Problem: the biggest Blob is non ever the 1, which should be detected

( i.e. big contemplation from the surface ) .

Blob Criteria

Absolute ratio between breadth and tallness of the blob jumping box. Minimal part size in pels

Jumping Box width-height Ratio

Choice of Speech templets

From among the big figure of address templets, one peculiar templet must be selected for each and every distinguishable gesture based on the ID ‘s of the gesture images and the ID ‘s of the templet.

Both should be mapped together by the computer science device and so the selected address templet is given as input to the text to speech transition algorithm. Here the gesture to text function can be done in many ways like each and every manus gesture is provided with a alone Idaho and such Id ‘s should be mapped with some predefined text templets. If there is a lucifer between, the catch taken and gesture in the database, so the matching address templet is selected.

Text to Speech Conversion

The selected address templet is so given to verbose text to address convertors which are freely available as a free beginning over the cyberspace. Here in this measure, the address templet is converted into matching sound signals and they can be used as input to the speaker units which will convey the message aloud to the existent universe.

UI for Controling options utilizing projector

The system has user interface options for commanding the system public presentation like Volume control of the talkers, Talking frequence of the talker, Accent and linguistic communication controls are altered utilizing the user interfaces provided by the projector by utilizing Augmented Reality constructs, with the aid of markers and their place coordination. This will even can be used to heighten the serviceability of the smartphones by supplying an Augmented World touch to the smartphones.


The system theoretical account which we have proposed here is the new one which uses the 6th sense device and serves as an sweetening to the device every bit good as an unreal speech production assistance for the dumb. Hence they can utilize this device as a 6th sense device, nomadic phone and even as the proposed speech production this uses the broader field Augmented Reality & A ; Sixth sense for its working its farther sweetening options are truly really unfastened.