How do you achieve a high level of response in dog training? You might think that dispensing an awesome treat for every successful trial would make a dog reliable.
But in practice, rewarding for all trials of a cued behavior results in lower response rates, not higher. Referred to as Continuous Reinforcement (CR), rewarding every iteration causes a behavior to degrade: The dog knows a treat will be waiting, so why hurry? In fact, what keeps a dog at the top of her game with cued behaviors is to reward randomly—and handsomely.
The Vending Machine vs. the Slot Machine
At the outset of teaching a brand new cue, Continuous Reinforcement is the most effective option. For instance, when teaching a puppy to sit, rewarding each successful sit makes sense because your focus is on clearly pairing the verbal cue and hand gestures with the behavior. You’re basically a vending machine: Put the money in (your puppy sits) and the reward appears.
But your being a vending machine for too long causes the puppy to stop working so hard. Why bother sitting quickly when a treat inevitably appears regardless of how long it takes that butt to hit the ground? Continuous Reinforcement for too long also causes the dog to become dependent on the food reward: She will refuse to work unless food is present.
Before you get to that point (usually within a few days of teaching a new cue), it’s time to move to some sort of intermittent reinforcement schedule. Which means it’s time to retire the vending machine and fire up the slot machine.
Once your dog reliably performs a behavior on cue using Continuous Reinforcement, shift to Variable Ratio (VR) reinforcement. Variable Ratio is a slot machine, pure and simple. When you’re in Vegas playing the slots, what keeps you pumping quarters for hours—besides the free cocktails, of course—is this:
The probability of hitting the jackpot remains constant, even though the number of plays required to hit the jackpot changes.
Start with a low ratio; VR3 means that you reward, on average, 1 out of every 3 responses. For example:
Reward trials 1 2 7 9 15 18 19 23 29 and 30
In this example, we provide 10 rewards over the course of 30 trails, which averages out to 1 in 3, or VR3. You can then increase your ratio, to VR5 for example, which decreases the frequency of reward, but doing so too fast can lead to frustration on the part of your dog (known as “ratio strain”), so take it slow.
Once you’ve quickly moved past Continuous Reinforcement and your dog is reliably performing with a Variable Ratio schedule, it’s time to polish up the behavior by only rewarding the very best trials. This is referred to as Differential Reinforcement of Excellent Behaviors, or DRE. This is when you start getting picky about what you’re willing to reward and become a referee who rewards only the best trials.
For example, to refine a puppy’s sit using DRE, only reward when she sits immediately, with no time delay and without moving from the spot she then inhabits. If she delays or takes any steps before sitting, simply turn and ignore her and wait a few minutes before trying again. This is a good time to use very high-value food (chicken, cheese, liver) because your dog will be more willing to work, just like you’d be more likely to work holidays if you were paid double-time-and-a-half.
Keep It Simple
Teach a new cued behavior and make it reliable by moving quickly through this hierarchy:
- A vending machine that rewards all successful trials (Continuous Reinforcement);
- A slot machine that rewards random successful trials (Variable Ratio); then
- A referee who judges each trial and determines if it is better than at least half of all trials and only rewards if it is.
Moving through these levels is not rigid and you may combine aspects of more than one as you progress. Be ready to back up a step if you’ve moved too fast—your dog will let you know!