ControlNet with Stable2go

Once you’ve mastered the Remix and More tools, there’s a more powerful option available called ControlNet, a powerful set of modes to transform images from an uploaded image. Take a moment to study the image above.

On the upper left, the “control image” is a real, original photograph. This is the input. On the right, these are the transformations from the control image from the various ControlNet modes. The naming should be self-evident for most: Segment, Edges, Contours and Depth use the physical aspects as the basis for the new image. The general shape is transferred. In the case of Reference, you can see that the sofa was sampled and moved to a different part of the room. Not shown here is Skeleton, which involves uploading a special kind of pose file.

Let’s get into it and learn by doing.

Lesson Goals

  • Learn about ControlNet and its modes
  • Turn Bruce Lee into a Muppet & create variations
  • Upload a “skeleton” pose and make a person in that pose
  • Upload a shape and render into its outline


Download these 2 images to your device to follow along.


Learn Edges: Make Muppet Bruce Lee

Step 1: Upload the source photo

The file was provided above. On the top right button of your browser, click Upload and add Bruce Lee.

You will see a confirmation message that says “Upload received, enter remix prompt below”

Step 2: click ControlNet



Step 3: Write your prompt

What we’ve done so far is selected a template. We have selected the “edges” as the mode. Next, copy this prompt below.  The part that says /masks is an optional debugging command that lets you see how the image was understood.

A strange colorful cartoon Muppet

Step 4: Choose the concept

Pick a model from the concepts system, instead of a recipe.  A recipe will add its own keywords, which we don’t want for this exercise.

Here, we have selected the concept Level4 which will produce some pretty silly characters.

When the results appear, because we added /masks, an additional image revealing the Depth of the picture will appear.

This helps us troubleshoot and fix our prep image if unwanted elements are interfering with our results.

Repeat this lesson with the sign post photograph provided above, and try a different mode.

Compare the results of Edges and Depth, can you see the difference in mask?  Depth was a lot more successful at providing a template for our muppet, because the edges of the body and face are softer, not as sharply defined. Understanding which mode works best for each project, and inspecting the Mask, will make you a master at ControlNet. You can also limit the effect:


You can control how much the effect is applied using a parameter for guidance


You don’t have to re-upload the photo every time. You can go directly into ControlNet and see your previous uploads.

Resolution Matters.  Rendering into the exact same size of your uploaded photo will yield the best results.

Each mode ControlNet Mode has strengths and weaknesses

  • Depth sometimes hallucinates extra information based on the shape of the object, meaning you may end up with unwanted extras. Meaning you may think the background of your render will be void of objects but it did pick up on stuff even though you can’t see it. We think this has something to do with the model’s training, so unnatural images may confuse it.
  • Edges and Contours are perfect for cutting out a shape or person and placing them on a blank canvas, it will fill the rest of the scene and look very cool, depth though doesn’t appear to work at all on these images


  • Edges (Canny) — best for objects and obscured poses, where it creates a line drawing of the subject, like a coloring book, and fills that in
  • Contours (HED) — an alternative, fine-focused version of edges.  This one and Edges retains the most resemblance to the preset image
  • Depth – as the name implies, creates a 3D depth mask to render into
  • Segment – detects standalone objects in the image
  • Reference – attempts to copy the abstract visual style from a reference image into the final image
  • Pose — best for people whose joints are clearly defined, but you want to completely discard the original photo’s finer details. Just the pose.

One of these modes is very much unlike the others:

  • Skeleton — Upload the ControlNet-extracted mask from a pose, and render from that skeleton’s pose. Can only be used as an input here.


If you have time for one more lesson, let’s try another mode:

Step 1:  Upload an “OpenPose” Skeleton

We call this the UpUp pose  You can find thousands of OpenPose skeletons all over the web


Step 2: Choose Skeleton Mode

Skeleton follows the general motion lines of the input

You can output a skeleton mask by adding the word /masks to the Other field

Step 3: Enter this cool prompt and Render it

Don’t forget to choose an art style.  The concept used below is called BreakAnime.  Prompt by community member, Trako

2d, kicking a tiger, redhead, bangs, pigtails, absurdres, 1girl, angel girl, garter belts, training clothes, checkered legwear, white skin, cute halo, cross-shaped mark, colored skin, (monster girl:1.3), angelic, innocent, shiny, reflective, intricate details, detailed, dark flower dojo, thorns, [lowres, horns, blurry, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, (low quality, worst quality:1.4), normal quality, jpeg artifacts, signature, watermark, username, blurry, monochrome, error, simple background,] <breakanime>