The source and target meshes ("cages") must have the same number of vertices and same triangle mesh connectivity.
The ordering of the vertices matters too. Vertex 1 on the source must correspond to vertex 1 on the target; meaning, it must be roughly in the same anatomical location on the human. Same for vertex 2, and vertex 3, and so on. In your images, it looks like this is not always the case.
If this property is not satisfied, one obtains the artefact called "texture swimming", and it will also cause strange shapes of transferred bones.