L16 - Deep Copy

#ece244

shallow copy: fails when data members are pointers

Take, for example, the case where the data members are strings.

Note, we are using strcpy and strlen here, which use more or less the same syntax as in C. The #include statements uses <cstring> instead of <string.h>.

#include <iostream>
#include <cstring>
using namespace std;
class MyString {
	private:
		int len;
		char* buf;
		// buffer string, which is a pointer to first char of string
	public:
		MyString() { // constructor
			len = 0;
			buf = nullptr;
		}
		MyString (char* src) { // constructor if pointer to character passed
			// copies src to buf
			buf = new char [strlen(src) + 1];
			// allocate memory for buf
			// allocate one char more for \0
			strcpy(buf, src);
			len = strlen(src);
		}
};
void setString(char * src) {
	delete [] buf; // deallocates all memory stored at buf
	buf = new char [strlen(src) + 1]; // allocates new memory at buf
	strcpy(buf, src); // copies the values at src to buf using strcpy
	len = strlen(src); // sets private var len to length of src
}
MyString::MyString (const MyString& x) { // shallow copy, from previous lecture
	len = x.len;
	buf = x.buf;
}

int main() {
	MyString a("Hello"); // a.buf now points to memory holding "Hello"
	MyString b(a); // shallow copy. b.buf points to same address as a.buf
	b.setString("Oops"); // changes the stirng in both a AND b
	// because buf is a pointer, so the pointer address is copied
	return 0;
}

The issue here is that buf is a pointer, so when it is copied by b(a), it copies the address that buf points to, and not the string at buf itself. Thus, when we use setString("Oops") to change the value at buf, we change it for both a and b.

further issues

  1. when b.setString is called, it delete[]'s the memory at the address that b.buf, and consequently a.buf, point to. However, when we reallocate b.buf we don't use the same exact address as before, so a.buf is just a dangling pointer, thus losing its value of "Hello".
  2. when both a and b go out of scope, delete[] is called twice on the same memory, thus causing the program to try to deallocate memory that has already been deallocated (and has not yet been set to nullptr).

deep copy: redefining shallow copy constructor to handle pointers

A deep copy solves the issue with shallow copy of pointers by defining the b(a) constructor (shallow copy) such that:

  1. new memory is allocated specifically for the new object, instead of just using a pointer to the same memory
  2. data is copied from original object into that memory

Thus, to fix the code above, we use the following constructor definition to replace the MyString::MyString (const MyString& x) { ... } function.

MyString::MyString(const MyString &src) {
    // 1. Copy the non-pointer data
    len = src.len;

    // 2. Allocate NEW memory for the new object
    buf = new char[src.len + 1];

    // 3. Copy the DATA from the source buffer into the new buffer
    strcpy(buf, src.buf);
}

Then, when we deal with the logic in the main function:

int main(void) {
	MyString a("Hello"); 
	// a.buf points to "Hello"
	MyString b(a);
	// Deep copy! 'b.buf' points to *new* memory that also holds "Hello"
	b.setString("Oops"); // This deletes 'b's buffer and gives it a new one.
	// 'a' is completely unaffected and still safely holds "Hello".
}

We never let b.buf equal to a pointer that points to the same address as a.buf, instead allocating new memory and copying the data stored at a.buf over to b.buf. Thus, a.buf and b.buf are completely separate.


rule of three

If your class needs one of the following, then it almost certainly requires all three:

  1. user-defined destructor
  2. user-defined copy constructor
  3. user-defined copy assignment operator

Note: all three of these are defined automatically when an object is created. However, if we redefine one of them, we must redefine them all.

In the case of MyString:

  1. A user-defined destructor is needed to free the dynamically allocated memory pointed to by buf when MyString object goes out of scope or is destroyed.
    MyString::~MyString() {
    	if (buf != nullptr) {
    		delete[] buf;
    		buf = nullptr; // to prevent dangling ptr
    	}
    }
    
  2. A user-defined copy constructor is needed to copy the data at the pointer, not the pointer itself. In other words, to perform a deep copy. This is the code that we implemented above.
  3. A user-defined copy assignment operator (operator=) is needed to ensure that you don't overwrite b.buf with the pointer a.buf; instead, you overwrite it with the data stored at a.buf.
    MyString& MyString::operator=(const MyString& src) {
        // 1. Check for self-assignment (e.g., a = a;)
        if (this == &src) {
            return *this;
        }
        // 2. Delete the OLD data in this object
        delete[] buf;
        // 3. Perform the deep copy (same as copy constructor)
        len = src.len;
        buf = new char[src.len + 1];
        strcpy(buf, src.buf);
        // 4. Return a reference to this object (for chaining: a = b = c;)
        return *this;
    }
    

copy constructor vs assignment operator (=)

Feature Copy Constructor Assignment Operator
Purpose Initializes a new object Overwrites an existing object
When called Whenever we create a new instance of an object, e.g.
MyString b(a); or
MyString b = a;
a = b;
Return type None; it's a constructor *this - returns a dereferenced pointer to the object itself (see: L14) to allow for chaining
Parameter const MyString& - must be a reference to avoid recursion (see: L15 - Copy Constructors) Can be either pass-by-value or pass-by-reference, but use pass-by-reference for efficiency
Action Allocates new memory for buf Deletes old memory at a.buf, and then allocates new memory at a.buf and sets to data at b.buf