Copy and move semantics

Recently, I have been asked some questions by an academic juniors (by a few years) about smart pointers. In the discussions, we came across terms like copy-constructible, and move semantics. While these concepts may be straightforward for more experienced C++ developers, the concepts are more absurd for programmers who are brought up in the era of garbage-collected languages.

Let’s go through the easy part first: copy semantics. When we talked about copy semantics we meant that a call like these:

vector<int> list;
<..>
vector<int> list2(list);  // Copy constructor

vector<int> list;

/* Thanks to Richie for providing a
   correct version of assignment. */
vector<int> list2;
list2 = list;  // Copy-on-assign

will results in list2 copying the content of list. The first example is an example of a copy constructor: a constructor that takes as a parameter an object of the same type as the constructor’s class (roughly-speaking), and copy the contents of the passed objects. It generally means that changing the content of list will not change the contents of list2 and vice-versa. A class with copy constructor that fulfill the copy semantic is sometime called copy-constructible.

The second example also performs copying operation. The operation is, however, implemented as assignment operator. This is sometime called copy-on-assign.

We should perhaps note that the default copy-constructor and assignment operator generally does not fulfill copy semantics. In fact, it is neither copy nor move semantics; neither here nor there. The default copy-constructor and assignment performs (more or less) a shallow copy; a mixed semantics if you like. They directly copy fields contents. If the field is of primitives type, it correctly performs copy operation. If the field is of types that correctly implemented copy semantics, the default also correctly performs copy operation; otherwise they will not adhere to proper copy semantics. Furthermore, if the field is pointer type, it will perform a copy: a copy of the address in the pointer. Hence, both the new object and the old one will contain the same address in the pointer field. Most of the time, this is not what you want.

Remember to not rely on the defaults: either implement your own copy-constructor and copy assignment, or make them private so that they will not be automatically generated. Boost has a base-class called noncopyable that you may want to use. I prefer using a macro (the macro definition is incomplete but shows the structure that we want):

#define DISABLE_COPY_AND_ASSIGNMENT(classname) \
classname(const classname& t) {} \
classname& operator=(const classname& t) { \
  return *this; \
}

// In some_class.h:
class SomeClass {
 public:
  <..>
 private:
  <..>
  DISABLE_COPY_AND_ASSIGNMENT(SomeClass);
}

Moving on, move semantics arise (more or less) because copy semantics impose performance overhead due to memory allocation and initialization. In many cases you may simply want to give the object to another piece of code without performing an expensive copy operation. In the olden days, you would simply initialize the object you wish to pass on as a pointer and simply pass the pointer to another person in the future. However, this risks 2 major issues: (1) as long as you hold a reference to the pointer, you may still be able to modify the object (a big issue when involving multi-threading); (2) you have to perform manual memory management. Hence, we arrive at modern move semantics. :)

Today, most C++ developers are well-acquainted with smart pointers (if you do not know what they meant, please Google it, they are important). One of the basic smart pointers in C++0x is called unique_ptr. A unique_ptr maintain (as its name implies) a unique pointer: when one instance of unique_ptr is holding to a pointer to an object, no other instances may have a pointer to the same object. Furthermore, people have developed techniques to ensure that you never need to hold a raw pointer at all. All seems good. Well, not really. Smart pointers usually rely on stack allocation, which means that they die (along with the object they hold) when they go out of scope. Here is where move semantics become useful:

unique_ptr<SomeClass> createSomeClass() {
  unique_ptr<SomeClass> ptr(new SomeClass(<..>));
  <..>
  return ptr;
}

// Somewhere else:
unique_ptr<SomeClass> a_ptr = createSomeClass();

Note that, technically, a copy and an assignment occurs here (however, many modern compilers may do away with the copy as optimization). First, when you return ptr, a copy-constructor for unique_ptr is called with ptr as its argument. After that, this new object is assigned to a_ptr. Note that while I said copy-constructor, a more apt term would be move-constructor. Roughly, a move-constructor will pilfer the pointer from the unique_ptr passed to it; the original unique_ptr will no longer hold the pointer. Hence, the term move semantics. Similarly, on assignment, the pointer is being moved from one unique_ptr to another.

Half the time, you would adhere closely to either copy or move semantics. However, sometime you may want to consider partial copy semantics, e.g. shallow copy. This is generally acceptable since the cost of following full copy semantics may be prohibitive. However, we usually do not mix copy semantics and move semantics together. They generally don’t play well together and will cause confusion to other developers. (There is no such thing as mixed copy/move semantics.)

Share this on:
  • Print
  • Facebook
  • Twitter
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Blogplay
  • StumbleUpon

Tags: ,

3 Responses to “Copy and move semantics”

  1. Another c++ Blog » Visual C++ Team Blog : decltype: C++0x Features in VC10, Part 3 Says:

    [...] Copy and move semantics [...]

  2. Richie Says:

    vector list2 = list; // Copy-on-assign

    This is not correct. The compiler will treat this line the same way as the first line and use the copy constructor (try it out yourself!). If you want to use the assignment operator you have to split up the declaration and the initialization in two lines.
    Here is a small test program:

    #include <iostream>
    #include <stdlib>
    
    class Test {
    private:
    	std::string name;
    public:
    	Test(std::string name) {
    		this->name = name;
    		std::cout << name << ".Test()" << std::endl;
    	}
    
    	Test(const Test& t) {
    		name = t.name;
    		std::cout << name << ".Test(const Test& t)" << std::endl;
    	}
    
    	Test& operator=(const Test& t) {
    		name = t.name;
    		std::cout << name << ".operator=(const Test& t)" << std::endl;
    		return *this;
    	}
    
    	~Test() {
    		std::cout << name << ".~Test()" << std::endl;
    	}
    };
    
    int main(int argc, char** argv) {
    	Test test1("test1");
    	Test t2(test1);
    	Test t3 = test1;
    	t2 = t3;
    	return 0;
    }

    It’s a quite strange exceptional rule in C++ and I was quite surprised reading it a few days ago.

  3. shards Says:

    Yes, you’re perfectly right. I actually have not had to use assignment operator at all recently (a lot of my code disallow copy&assign specifically). But you’re perfectly right, I’ll edit the post to reflect this. (:

    Thanks!

    P.S. Sorry for the late reply, was extremely busy with work and research to check on the block.

Leave a Reply