## 2016/11/21

### C vs C++, performance on AVR

The aim of this post is to fight the generalized belief of C++ being too slow of a language for embedded environments. This belief goes around, saying that microcontrollers should still be programmed in C, or even in assembler. Probably you don't agree with me right now. The idea of C being much more efficient than C++ is so extended that it almost seems like sacrilege to debate it. That's why I'm about to make a series of comparisons between both languages, throwing in some real and objective numbers (code size, execution time, etc). After we prove that not only can C++ compete with good old C, we'll see it's actually a better alternative. For that, besides performance metric, I will compare things like safety, code readability or portability.

The platform I'm using for the benchmarks will be an AVR microcontroller. Specifically, an Atmega328p, because its use is so common, and because it's been the base platform for Arduino, which many use as an example to argue that C++ is slo (more on this later).
In order to be as fair as possible, I'm also going to take an extremely different reference: Atmel Software Framework, in C. Given that Atmel is the manufacturer of AVRs, and that they claim their libraries to be optimized by experts both for code size and performance (here), they should make a good benchmark.
As said above, on the C++ we'll use arduino for reference. Arduino doesn't provide the best performance, but it makes a great example of usability, and thus will be perfect to illustrate a different point.
In the middle of both, I'll develop a small toy framwork to show that's perfectly possible, by using modern C++, to build libraries as easy to use as arduino, without giving up C-level performance (even Assembler performance).

Step 0: Environment and configuration.

I'll run all tests using Atmel Studio, compiling with GCC, activating C++11 standard and optimizing for space (unless noted otherwise).
Before we start measuring our own code, let's take a look at the code generated by the compiler to initialize it's own environment (execution stack, heap, etc) inside the microcontroller. So, how big is a minimal C or C++ program?

int main(void)
{
while (1)
{
}
}

Surprisingly, the answer is 166 bytes in C, 134 bytes in C++. Things don't start good for C defenders. Let's not rush, however. We'll just take not of this numbers, so we can later make better judgement on the size of our own code.

Step 1: Light an LED

The microcontroller equivalent to writing a "hello, world" is to turn an LED on. This is a very simple task, consisting of configuring a port, and turning on a specific pin. Barely touching a couple registers. In this example, we are lighting the pin that connect's to arduino uno's built in LED. The C version, right out of Atmel sample code would be:

int main(void)
{
DDRB |= (1 << DDB5);
PORTB |= (1 << PORTB5);

while (true);
}

The generated assembly only occupies 4 more bytes than an empty program (170 bytes total), and corresponds to the following assembly code.

sbi 0x04, 5; // DDRB |= (1 << DDB5);
sbi 0x05, 5; // PORTB |= (1 << PORTB5);

Pefect. One instruction per task. Unbeatable performance. It's worth noting this code can be compiled in C++ too, and the result will be exactly the same. But the goal here is not to prove C++ can do the same, but that it can do better. For starters, the above code isn't really very readable. Dooming the programmer to deal directly with registers is kind of uncomfortable, error prone, non-portable and definitely isn't helping to maintenance. Ideally, we would do something like this:

int main(void)
{
pinMode(LED_BUILTIN, OUTPUT);
digitalWrite(LED_BUILTIN, HIGH);

while (true);
}

That's Arduino code. Clean and clear. The problem? even extracting only the parts of the arduino library that directly intervene here, the resulting binary is 368 bytes in size, vs 4 bytes of the C version. In order to understand why, we need to take a look at both libraries. If we do, we will discover that this part of the Arduino library is actually written in plain C!
Let's start analyzing the first sample, the efficient ASF, C code. When we include the relevant part of atmel headers, we see the whole code come to this:

#include
#if __AVR_ARCH__ >= 100
#  define __SFR_OFFSET 0x00
#else
#  define __SFR_OFFSET 0x20
#endif

#define _MMIO_BYTE(mem_addr) (*(volatile uint8_t *)(mem_addr))
#define _SFR_IO8(io_addr) _MMIO_BYTE((io_addr) + __SFR_OFFSET)

#define DDRB _SFR_IO8(0x04)
#define DDB5 5
#define PORTB _SFR_IO8(0x05)
#define PORTB5 5

int main(void)
{
DDRB |= (1 << DDB5);
PORTB |= (1 << PORTB5);

while (1);
}



Defines. Defines and macros everywhere. The only way to keep performance so tight is with macros and defines so that the compiler makes all the work and avoids any intermediate computation. Defines are unsafe, don't keep type information, get lost in the preprocessor (so they can't be seen while debugging), and macros are known for being able to hide pretty obscure bugs. Besides, both creep all your code, and can't be contained in namespaces or anything. So if you have a macro that conflicts with something else in your code, you are screwed. For a more elaborate discussion about the disadvantages of macros, see Scott Meyers's Effective C++.

Now let's see the Arduino version, which with all relevant code included, looks like this:

/*
* CppTest.cpp
*
* Created: 2016-09-02 14:14:39
* Author : Technik
*/

#include
#include
#include

#define HIGH 0x1
#define LOW  0x0

#define INPUT 0x0
#define OUTPUT 0x1
#define INPUT_PULLUP 0x2

#define digitalPinToPort(P) ( pgm_read_byte( digital_pin_to_port_PGM + (P) ) )
#define digitalPinToBitMask(P) ( pgm_read_byte( digital_pin_to_bit_mask_PGM + (P) ) )
#define digitalPinToTimer(P) ( pgm_read_byte( digital_pin_to_timer_PGM + (P) ) )
#define portOutputRegister(P) ( (volatile uint8_t *)( pgm_read_word( port_to_output_PGM + (P))) )
#define portModeRegister(P) ( (volatile uint8_t *)( pgm_read_word( port_to_mode_PGM + (P))) )

#define LED_BUILTIN 13

#define NOT_A_PIN 0
#define NOT_A_PORT 0

#define NOT_AN_INTERRUPT -1

#define NOT_ON_TIMER 0
#define TIMER0A 1
#define TIMER0B 2
#define TIMER1A 3
#define TIMER1B 4
#define TIMER1C 5
#define TIMER2  6
#define TIMER2A 7
#define TIMER2B 8

#define PB 2
#define PC 3
#define PD 4

#ifndef cbi
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#endif

// these arrays map port names (e.g. port B) to the
// appropriate addresses for various functions (e.g. reading
// and writing)
const uint16_t PROGMEM port_to_mode_PGM[] = {
NOT_A_PORT,
NOT_A_PORT,
(uint16_t)&DDRB,
(uint16_t)&DDRC,
(uint16_t)&DDRD,
};

const uint16_t PROGMEM port_to_output_PGM[] = {
NOT_A_PORT,
NOT_A_PORT,
(uint16_t)&PORTB,
(uint16_t)&PORTC,
(uint16_t)&PORTD,
};

const uint16_t PROGMEM port_to_input_PGM[] = {
NOT_A_PORT,
NOT_A_PORT,
(uint16_t)&PINB,
(uint16_t)&PINC,
(uint16_t)&PIND,
};

const uint8_t PROGMEM digital_pin_to_port_PGM[] = {
PD, /* 0 */
PD,
PD,
PD,
PD,
PD,
PD,
PD,
PB, /* 8 */
PB,
PB,
PB,
PB,
PB,
PC, /* 14 */
PC,
PC,
PC,
PC,
PC,
};

const uint8_t PROGMEM digital_pin_to_bit_mask_PGM[] = {
_BV(0), /* 0, port D */
_BV(1),
_BV(2),
_BV(3),
_BV(4),
_BV(5),
_BV(6),
_BV(7),
_BV(0), /* 8, port B */
_BV(1),
_BV(2),
_BV(3),
_BV(4),
_BV(5),
_BV(0), /* 14, port C */
_BV(1),
_BV(2),
_BV(3),
_BV(4),
_BV(5),
};

const uint8_t PROGMEM digital_pin_to_timer_PGM[] = {
NOT_ON_TIMER, /* 0 - port D */
NOT_ON_TIMER,
NOT_ON_TIMER,
// on the ATmega168, digital pin 3 has hardware pwm
TIMER2B,
NOT_ON_TIMER,
// on the ATmega168, digital pins 5 and 6 have hardware pwm
TIMER0B,
TIMER0A,
NOT_ON_TIMER,
NOT_ON_TIMER, /* 8 - port B */
TIMER1A,
TIMER1B,
TIMER2A,
NOT_ON_TIMER,
NOT_ON_TIMER,
NOT_ON_TIMER,
NOT_ON_TIMER, /* 14 - port C */
NOT_ON_TIMER,
NOT_ON_TIMER,
NOT_ON_TIMER,
NOT_ON_TIMER,
};

static void turnOffPWM(uint8_t timer)
{
switch (timer)
{
#if defined(TCCR1A) && defined(COM1A1)
case TIMER1A:   cbi(TCCR1A, COM1A1);    break;
#endif
#if defined(TCCR1A) && defined(COM1B1)
case TIMER1B:   cbi(TCCR1A, COM1B1);    break;
#endif
#if defined(TCCR1A) && defined(COM1C1)
case TIMER1C:   cbi(TCCR1A, COM1C1);    break;
#endif

#if defined(TCCR2) && defined(COM21)
case  TIMER2:   cbi(TCCR2, COM21);      break;
#endif

#if defined(TCCR0A) && defined(COM0A1)
case  TIMER0A:  cbi(TCCR0A, COM0A1);    break;
#endif

#if defined(TCCR0A) && defined(COM0B1)
case  TIMER0B:  cbi(TCCR0A, COM0B1);    break;
#endif
#if defined(TCCR2A) && defined(COM2A1)
case  TIMER2A:  cbi(TCCR2A, COM2A1);    break;
#endif
#if defined(TCCR2A) && defined(COM2B1)
case  TIMER2B:  cbi(TCCR2A, COM2B1);    break;
#endif
}
}

void pinMode(uint8_t pin, uint8_t mode)
{
uint8_t bit = digitalPinToBitMask(pin);
uint8_t port = digitalPinToPort(pin);
volatile uint8_t *reg, *out;

if (port == NOT_A_PIN) return;

// JWS: can I let the optimizer do this?
reg = portModeRegister(port);
out = portOutputRegister(port);

if (mode == INPUT) {
uint8_t oldSREG = SREG;
cli();
*reg &= ~bit;
*out &= ~bit;
SREG = oldSREG;
}
else if (mode == INPUT_PULLUP) {
uint8_t oldSREG = SREG;
cli();
*reg &= ~bit;
*out |= bit;
SREG = oldSREG;
}
else {
uint8_t oldSREG = SREG;
cli();
*reg |= bit;
SREG = oldSREG;
}
}

void digitalWrite(uint8_t pin, uint8_t val)
{
uint8_t timer = digitalPinToTimer(pin);
uint8_t bit = digitalPinToBitMask(pin);
uint8_t port = digitalPinToPort(pin);
volatile uint8_t *out;

if (port == NOT_A_PIN) return;

// If the pin that support PWM output, we need to turn it off
// before doing a digital write.
if (timer != NOT_ON_TIMER) turnOffPWM(timer);

out = portOutputRegister(port);

uint8_t oldSREG = SREG;
cli();

if (val == LOW) {
*out &= ~bit;
}
else {
*out |= bit;
}

SREG = oldSREG;
}

int main(void)
{
pinMode(LED_BUILTIN, OUTPUT);
digitalWrite(LED_BUILTIN, HIGH);

while (true);
}


Wow. More than 200 lines of code. We start to see why this code might not be as fast as Atmel's. Everytime we change the state of a pin, we run several reads in program space and a few if-elses. This code even needs to deactivate interruptions. However, the code does many more things, like checking the port actually exists in our mcu. However, if the user makes a mistake and tries to activate a pin in a port that doesn't exist, the error will be silent, and no one will notice.

Seen both examples of how to use C both for performance and usability, and declared some problems of both extremes, we can now answer the question: How does C++ allow us to sort this mess?

Our main ally will be templates. By defining a few template structures in our library, we will get the compiler to reduce the generated code, while keeping readability and safety. Incidentally, we will also make most errors to show early during compilation instead of remaining hidden until (best case), the first execution.

Starting from the bottom, first thing is accessing the registers. There are two types of registers in the AVR: 8-bit and 16-bit registers. For all practical matters, you use them the same way, and all you need to have them defined is their size and position in memory. Both things are known at compile time, so they will be template arguments. As most functionality is common, it will be shared in a shared, template base class. And since all the state information is stored in the register itself, the class will not have members. We could even make it static, but we would loose the asignment operators, which help a lot with readability in this case.

First, we get rid of the DDRB regiser define

#define DDRB _SFR_IO8(0x04)


and change it by a struct. This way, we skip one macro, so adding type safety, limit access by namespaces and keep all the flexibility. Everything at once.

struct DDBRegister {
void operator=(uint8_t _r)
{
*reinterpret_cast<volatile uint8_t>(0x24) = _r;
}
operator uint8_t() const
{
return *reinterpret_cast<volatile uint8_t> (0x24);
}
operator volatile uint8_t&()
{
return *reinterpret_cast<volatile uint8_t> (0x24);
}
} DDRB;

Until now, it's all advantages, and the code generated is still exactly the same two assembly lines. But we can generalize it and extend it to other registers.

template<uint16_t address_>
struct Register {
void operator=(uint8_t _r)
{
*reinterpret_cast<volatile uint8_t> (address_) = _r;
}
operator uint8_t() const
{
return *reinterpret_cast<volatile uint8_t> (address_);
}
operator volatile uint8_t&()
{
return *reinterpret_cast<volatile uint8_t> (address_);
}
};

Register<0x24> DDRB;
Register<0x25> PORTB;

To improve readability, we can add methods for bit setting and clearing.

template<uint16_t address_>
struct Register {
void operator=(uint8_t _r)
{
*reinterpret_cast<volatile uint8_t> (address_) = _r;
}
operator uint8_t() const
{
return *reinterpret_cast<volatile uint8_t> (address_);
}
operator volatile uint8_t&()
{
return *reinterpret_cast<volatile uint8_t> (address_);
}

template<uint8_t bit_>
void setBit() { *reinterpret_cast<volatile RegT_> (address) |= (1 << bit_); }
template<uint8_t bit_>
void clearBit() { *reinterpret_cast<volatile RegT_> (address) &= ~(1 << bit_); }
};

Register<0x24> DDRB;
Register<0x25> PORTB;

Since the bit is a template parameter, the compiler still resolves these calls to a single instruction each, and user's code transforms to:

int main(void)
{
DDRB.setBit<DDB5>();
PORTB.setBit<PORTB5>();

while (1);
}

That's exactly as fast as the original code, a bit more readable, and way safer. We can still get rid of the last defines of DDB5 and PORTB5 by converting them in static constexpr, but there are better ways to face that. We are way far from Arduino's ease of use, which is our goal, and this code still permits many pitfalls, but this post is quite long already, so we will address all that in a second part.

## 2016/07/05

### Data Oriented Design vs Object Oriented Programming

I've been raised in the culture of Object Oriented Programming. I've always been told about the benefits of encapsulation, cohesion, locality, etc. There are very good reasons why a lot of smart people deeply support OOP. Designing good OOP architectures pays off. It saves a lot of time debugging errors, makes code easy to read and understand, and lets you focus on one part of the problem at a time.

But what if it's all wrong? In the last few years I've read about a concept known as Data Oriented Design, which many claim is a different paradigm promising huge performance improvements and that will make you question why you ever used OOP in the first place. Kind of a big claim, and big claims require good proof. So, when I came across with this talk by Mike Acton, I did the only thing I could do: I wrote a test.

The idea is simple: Have a bunch os squares defined by their radius and compute their areas. This is where a traditional OOP beginner tutorial would say "Make a class for Square ...".

class NiceSquare {
float radius;
float color[3];
public:
NiceSquare() : radius(3.f) {}
void computeArea(float& area) { area = radius*radius; }
};


However, following the principles of DoD, we realise that our data is not a square, but a bunch of squares, so...

struct BunchOfSquares {
float * radius;
float * color;
};

There is a good reason for that color member. We will use it later to control the packing factor of our data. But we just sacrificed encapsulation for no good reason. If computing a square's area is something the square can do itself, then computing a bunch of areas should be something a bunch of squares can do itself too. What if we took this DoD approach to the problem, but implemented it with OOP?

class BunchOfSquares {
float *radius;
float* color;
public:
BunchOfSquares() : radius(new float[N_SQUARES]) {
for(unsigned i = 0; i < N_SQUARES; ++i) radius[i] = 3.f;
}

~BunchOfSquares() {
delete[] radius;
}

void computeAreas(float* area) {
for (unsigned i = 0; i < N_SQUARES; ++i) {
float rad = radius[i];
area[i] = rad*rad;
}
}
};


Much better now. Notice we didn't really sacrifice any Object-Orientation here. We just realised what objects really belong to our problem. And that's actually the key. Most of the time, when we do OOP, we tend to design our classes to fit our mental model of day to day life. WRONG! You are not solving day to day life, you are solving a specific problem! Now the question of performance still remains, so lets measure it:

duration<double> oldBenchmark() {
NiceSquare *squares = new NiceSquare[N_SQUARES];
float*areas = new float[N_SQUARES];
auto begin = high_resolution_clock::now();

for (auto i = 0; i < N_SQUARES; ++i) {
squares[i].computeArea(areas[i]);
}

duration<double> timing = high_resolution_clock::now() - begin;
delete[] areas;
delete[] squares;

return timing;
}

duration<double> dodBenchmark() {
BunchOfSquares squares;
float* areas = new float[N_SQUARES];
auto begin = high_resolution_clock::now();

squares.computeAreas(areas);

duration<double> timing = high_resolution_clock::now() - begin;
delete[] areas;

return timing;
}

int main(int, const char**) {
ofstream log("log.txt");

for(int i = 0; i < 100; ++i) {
double oldTiming = real_millis(oldBenchmark()).count();
double dodTiming = real_millis(dodBenchmark()).count();
log << oldTiming << ", " << dodTiming << endl;
}

return 0;
}


If you pay attention, you will see the benchmarks have been written the old fashioned way. It would be better to realise I don't want just 1 timing, and that I won't perform just one benchmark, and write the test to do a bunch of benchmarks and store the results in a bunch of timming records. But for now, we'll stick to this format because it will be easier to read for people not used to DoD, and because I like the irony of it.

Back to the test, a quick run shows this.

The improvement is obvious, even for a dumb example like this, DoD is about 40% faster. Cool. But can we do better? Theory says that the big performance improvements of DoD come from not wasting cache space. The better we use our caches, the faster the test will run. That's what the color member is there for. It represents the more realistic scenario where classes have more than one member. By controlling the size of color, we control how sparse in memory are the radiuses. That way, completely removing the color should make both paradigms perform almost identically, right?
Definitely right. And if we move the other way around and increase color from 3 to 63 floats ...

That's absolutely a win. We have almost 85% improvement. DoD code is running more than 6x faster now. And it's still Object Oriented! We've lost none of the benefits of OOP!

In conclusion, Data Oriented Design doesn't mean throwing away all you know about OOP and good programming practices. It is a reminder to solve the problems we do have, instead of the problems we are comfortable thinking of. Even though its performance gain is very thightly coupled with low level hardware, DoD principles tell us that our code is really messed up from a very high level. The moment you forget what data you are dealing with, you're already going the wrong way. Know your problem, know your data. Then you can apply whatever programming paradigm you see fits better. And if you decide to go for OOP, remember there's no rule saying an "object" in your code has to match any object in your day to day life. So just choose the right objects for your data.

## 2015/09/15

### Home made 3d printed rocket nozzle

One of the most important parts in a rocket engine is the nozzle. A well designed nozzle will provide the optimum expansion of the exhaust gases and maximize thrust. I am working on a home made rocket, and decided to do a few tests with a real nozzle just to get a real feel of how these devices work.
So for starters, how does a rocket nozzle work and what is it for? The nozzle is basically a tube that redirects the combustion gases into one direction so to push the rocket. While doing so, it also accelerates the gases and lowers their temperature. And all of that without a single moving part. Rocket nozzles achieve this because they belong to the category of Convergent-Divergent Nozzles, or de Laval Nozzles, which take advantage of the properties of gases at supersonic speed. I won't get into much detail about the thermodynamics of ConDi Nozzles, as there are many resources that explain them perfectly (https://en.wikipedia.org/wiki/De_Laval_nozzle).
The important point here is how to design the nozzle, and for that there are a few things we need to know:
- The properties of  the gas that will flow through the nozzle (i.e. air), or gamma-
- The amount of air we can provide per second (mass flow).
- Our working pressures.
About air, all we need to know is the heat capacity ratio, which happens to be just about 1.4.
The amount of air is a bit more tricky. In my case it is limited by the air compressor I use to power the system. It's not so important to know the exact amount of air that will flow as it is to be sure that our nozzle becomes the limiting factor. In order for the air to go supersonic (locally), it needs to choke the nozzle. That means that you must be able to supply air enough to saturate it, or conversely, that the nozzle must be small enough to saturate with the air you can provide. For this reason, the simplest thing you can do is to design the nozzle with a throat smaller than the smallest area of your air feed system. Just measure the ducts of your feed system and choose a smaller size for the throat. My throat is about 4 mm diameter because the smallest duct of my air compressor has a diameter of 6mm. It could actually be even a bit bigger than the duct and still choke (for a bigger area, a smaller pressure will be able to choke the same mass flow) due to the pressure losses, but this way we have some margin.
Finally, the working pressures (along with gamma) will define the expansion ration of the nozzle (the only thing we are left to know to fix its geometry). Expansion ratio, the ratio between the area at the end of the nozzle, and the area at the throat, follows the following formula:
Taking into account that Pe, the expanded pressure will be equal to ambient pressure (1.013 bars) and that my air compressor can give up to 8 bars, that results in an expansion ratio of about 4, so the expansion radius will be about twice that of the throat. I will build my nozzle with a contraction angle of about 30º and an expansion angle of 15º.
As you can see in the pictures, the nozzle is a simple revolution solid.

Notice I let a pretty big input hole. That is because I use 3/8" plumbing to feed the system. That is conventional plumbing that you can buy in any hardware store and is pretty easy to work with and to seal. I also printed a small cap to transform a PVC tube into an adapter to hold the nozzle in place. This way I can put it on top of a weight scale to measure thrust. The picture below shows my poor man's engine test bench, where I can measure pressure vs force.

I didn't expect much of this at first, but the results impressed me. The whole "engine" (the chamber plus nozzle) weight less than 10 grams, and it delivers more than 200 grams of thrust. A force to weight ratio of more than 20. Not bad for 10 grams of plastic.

## 2015/04/09

### Prometheus Arm: Latest changes.

In the last few weeks, we have made some improvements in the design of Prometheus arm.
In order to be able to use antagonistic actuation in a simple fashion, it is important that actuation of each degree of freedom remains symmetrical. This means that if you open a finger a little, both inner and outer tensors should be displaced the same length. Initially, a pulley system was our choice for granting this. This iteration was tested in the first version (A3 project) and showd some major inconvenients. Tensors used to degrade quickly and loosed the joints, leading to poor performance of the fingers. In order to solve this issue, we developed a system with adjustable pulleys that allowd tensors to be readjusted.
This moving pulleys are hard to print and require the printer to be very well calibrated. Besides, the teeth degrade rather quickly. Taking into account that, and the fact that fingers were starting to become complex to assemble, this solution didn't seem fitted for anyone to print and assemble at home. A deep redesign was needed.

So we wanted to do three things: Get rid of as many tensors as possible, increase robustness and simplify assembly. The solution is a simple bar actuated mechanism.

Two solid bars link the moving parts of the finger, and keep reduce the number of degrees of freedom to one. This way, the whole finger can be operated from the base and a symmetric mechanism is only needed there. The design is also robust if you choose the right orientation at print time, and the reduced complexity of the features and part count makes it easier to print and assemble. Here is a proof of concept of the finger. The next step is to print a complete finger with the servo adapter, and then test its actuation triggered by electrodes.

## 2015/03/23

### Helicopter Electronics: Test #1

Quadcopters are so popular these days, but I still prefer traditional helicopters. They are more efficient, can carry heavier payloads, give a faster response ... and are a bigger challenge. That's why I'm revamping an RC helicopter into an autonomous platform to test a few technologies. This is the first test of some of its electronics.

Sidenote: It is my little brother that you hear in the video, and actually operating the controls.

## 2015/03/22

### A perfect fit for A3

It's been a while since the last time I wrote about the A3 project. It has recently morphed into the Prometheus Arm project, in which I work with a friend of mine. What motivated the change? Take a look yourself:

I must admit it. I can't resist Iron Man. That's just it, and now I want to contribute. My friend Pablo and I are now working on a better design of the arm, balancing between cost, simplicity, repairability and functionality, and will try to contribute to Limbitless' project with it.

Talking specifics, we are making the first tests with a 3d printed, improved version of the antagonistic mechanism that allows for better, finer and simpler adjustment of the joints.

The main problem with antagonistic actuation is obviously the cost. Since you are basically doubling the number of actuators, the costs goes up very quickly. However, a good mechanical design can simplify things a lot. I still don't have it fully documented, but I will soon post a method that uses symmetric cinematic chains to factor out some actuators while retaining the basic functionality. Reducing the number of actuators necessarily reduces the number of degrees of freedom (which originally was two for each joint), but I think this is a good trade off if all you are loosing is strength degrees.
More specifically, the tension regulators are shared among similar joints, so they will all be adjusted at the same time, but individual elasticity and torque tolerance is fully kept. I don't think the use-case for trying to exert different amounts of force with each finger is much common, so it is definitely worth the cost drop.

## 2015/03/08

### Set up TortoiseGit to work with GitHub

Introduction:
TortoiseGit is the easiest and most comfortable Git interface I know for Windows. It integrates seamlessly with Windows explorer and simplifies common tasks as commit, checkout, pull, push, etc. I'm so used to it that I always recommend it to everyone.
If anything I enforce more than the use Tortoise, is the use Git itself. Version control systems are one of the most powerful tools a developer can have. Switching from traditional (manual) code back up systems to any form of version control system is a qualitative change that can easily increase your productivity by an order of magnitude. Not to mention the benefits of version control when you're working in a multi-person project. In such case, version control is simply a must. And of all VCS I've tried, Git is the by far. Under Windows, Git is supported by mSysGit.
Below, I will present a way to install both mSysGit and TortoiseGit and make the work together with GitHub, one of the most popular repositories over the internet, and the one I use most.