In dog training, we are often told to keep your rewards unpredictable. “Don’t reward every time, dogs are going to be more obedient if your rewards are like the lottery.”
One thing I hate about dog training advice is that the people dispensing it never back up what they say with any reference to scientific studies. That doesn’t mean those studies don’t exist, just that I have to go hunting for them on my own 😉
I tend to reward my dog more often than not. I’d say about 90% of the time she gets a reward for completing a commend successfully. Sometimes I remember that I’m supposed to be unpredictable, and I’ll make a conscious effort to break up the treats a little, but to be honest, I like rewarding her for a job well done.
So what does the science say? Am I a dog training failure? Maybe 😁
Let’s define the term Schedule of Reinforcement first:
A protocol or set of rules that a teacher will follow when delivering reinforcers (e.g. dog treats). These rules dictate when, and how often, a reinforcer is delivered.
There’s a whole bunch of schedules that some science-type took the time to define, and the list is horribly boring, so I’ve trimmed it down to the two most pertinent ones in dog training: Continuous Schedule (rewarding after every single correct response) and Variable Ratio (lottery reward – get a reward after two correct responses, then a reward after 5, then a reward after 1, and so on. Unpredictable.)
The idea with dog training is this:
(I’m going to use ‘treats’ here as a synonym for reward, but you can, of course, use non-food rewards also)
When you are teaching your dog a new concept, you want to get them hooked, and make sure they know they’re doing the right thing. So you reward them generously. Sit-treat-sit-treat-sit-treat and so on and so forth. They are loving it.
Once you know your dog understands the sit command like a pro, and is feeling confident about things, you can move on to phase two.
In phase two, you cut back on the treats, not so much that they give up, but so that they’re never quite sure when they’re going to be rewarded. This is a gradual process. You don’t want to go random cold turkey. Start off giving a treat every second sit, then every third, then after one, then after three again, etc.
I’ve seen some books claim that dogs become addicted (like a gambler) to the unpredictable-yet-tantalizing-possibility of a treat. I don’t know about that, but there is some actual science going on here that makes sense.
Let’s imagine there is a vending machine at your work, and every day on your break, you walk to the vending machine, pop a dollar in, and out comes a chocolate bar.
It dispenses that chocolate bar without fail. It is extremely predictable in its chocolate-bar-dispensing powers.
Then one day, you stick a dollar in, and nothing comes out. You are frustrated. Maybe you try another dollar. Nothing happens. What do you do next? (I mean, other than calling the vending machine company 😜) Most normal people would be irritated, and they would also stop putting money in, assuming that the vending machine is not going to work. (this is also known as Extinction)
Now let’s imagine that the vending machine does something a bit different: Sometimes, when you put a dollar in you get a chocolate bar. Other times, you get TEN chocolate bars. And sometimes, you get none.
You’re never sure which is going to happen, but you know that if you keep putting dollars in, you will eventually get something, and there’s always a chance it will be the ten chocolate bar jackpot.
So what happens when you go down on your break and the vending machine doesn’t dispense a chocolate bar? You try again. And again. And again. And yes, you would still eventually give up at some point, but it would take a hell of a lot longer.
“BUT WHERE IS THE SCIENCE” you shout. “YOU PROMISED US SCIENCE”.
Inexperienced workers had higher productivity on the continuous reinforcement than on the variable schedule; experienced workers had higher productivity on the variable schedule than on the continuous schedule. Both the experienced and the inexperienced employees preferred the variable schedule over the continuous schedule.
Variable schedules produce higher rates and greater resistance to extinction than most fixed schedules. This is also known as the Partial Reinforcement Extinction Effect (PREE).
A study of thirsty rats getting water (poor rats!)
Animals that experienced no water reward on 33% of the reinforcement trials subsequently demonstrated an increased resistance to extinction of the runway response compared to continuously reinforced (CRF) animals.
So what does this all mean?
Well first, that I’m a bad trainer 😊
But also: Science agrees that keeping your rewards unpredictable is the way to go. With one caveat: When you are still teaching the behaviour, it is actually more beneficial to be on a continuous reinforcement schedule, aka rewarding every single time.
Of course, all this learning theory can work against you too. Which is why Betsy still stops mid-walk and tries to drag me backwards in order to sniff something. Because 70% of the time I don’t let her, but the other 30% I’m like, “Ehhh, whatever, I’m sleepy and what does it even matter.” That random reinforcement is the downfall of Betsy’s loose leash training. 💖