I don't think copy elision[1,2] means what you think it means, it simply allows the compiler to avoid e.g. allocating a new string when returning a string, or avoid allocating a new string to store the result of a temporary. That is, copy elision allows
std::string str = data.substr(2, 3);
return str;
to only allocate one new string (for the return value of substr), instead of two. There's no way the compiler can get out of constructing at least one std::string for the return value, especially if there's any form of dynamic substr'ing (e.g. parsing a CSV file with columns that aren't all the same width).
Sharing is only multithreading unfriendly if there's modification happening, and modification of textual (i.e. Unicode) data is bad practice and hard to get right, since all Unicode encodings are variable width (yes, even UTF-32, it is a variable width encoding of visible characters).
Furthermore, a string_view is strictly better than a string for many applications, since a string_view can always be copied into a string by the caller if necessary (i.e. each function can choose to return the most sensible/most performant thing, which is a string_view if it's just a substring of one of the arguments).
The only sensible argument against string_view in C++ I know is: it's easy to get dangling references. Which is correct, but that's a general problem with C++ itself, not with the idea of string views (Rust has a perfectly safe version in the form of &str, which cannot become dangling like in C++).
> Keep in mind that on a 64-bit architecture a view is at least 16 bytes large and that small strings can be copied to the stack resulting in better locality and reduced memory usage.
No, a string_view points into memory that already exists, there's no increased memory usage; a small string copied on to the stack will be part of the string struct, which is at least 3 * 8 = 24 bytes: a pointer, the length and the capacity. Also, a memcpy out of the original string is always going to be more expensive than just getting the pointer/length (or pair of pointers) for a string_view, since the memcpy has to do this anyway.
Yeah my example for copy elision sucked, but that doesn't mean it cannot play in favor when you work by value.
Sharing is only multithreading unfriendly if there's modification happening, modification of textual (i.e. Unicode) data is bad practice and hard to get right
Read-only access to data indeed scales "infinitely" on modern architectures.
No, a string_view points into memory that already exists,
Yes. Right. How do you store that? You need at least one pointer and and an int or two pointers. That 16 bytes. Memcpy for a couple of bytes is very quick when it's stack to stack thanks to page locality.
Also, if you are using pointers you will have aliasing issues which will have an impact on performance. If you work by values you allow the compiler to optimize things better.
For small strings string view are just dumb and "most of the time" strings are very small.
To give a better example of why working a string view is both a bad idea and dangerous, it's as if you said "I don't want to copy this vector, therefore I will work on iterators". That's obviously a bad idea.
> Yeah my example for copy elision sucked, but that doesn't mean it cannot play in favor when you work by value.
Not just sucked; it was entirely wrong. Copy elision is not related to std::string vs. string_view. Even with copy elision turned up to 11, returning a std::string will be more expensive than a string_view.
> How do you store that? You need at least one pointer and and an int or two pointers. That 16 bytes. Memcpy for a couple of bytes is very quick when it's stack to stack thanks to page locality.
I was very careful to cover exactly this in my comment.
Computing the memcpy is strictly more work than creating a string_view, since you need the information that is stored in a string view (i.e. pointer and length) to call memcpy.
Furthermore, the 'stack string' is actually stored contained inside a std::string value, which is larger than 16 bytes. There is no way that returning a string_view causes higher memory use at the call site than returning a std::string. (If you're complaining that it forces old strings to be kept around, well, you can always copy a string_view to a new std::string if you need to, i.e. a string_view can do the expensive 'upgrade' option on demand.)
Here's the quote from my comment above:
> a small string copied on to the stack will be part of the string struct, which is at least 3 * 8 = 24 bytes: a pointer, the length and the capacity. Also, a memcpy out of the original string is always going to be more expensive than just getting the pointer/length (or pair of pointers) for a string_view, since the memcpy has to do this anyway.
> Also, if you are using pointers you will have aliasing issues which will have an impact on performance. If you work by values you allow the compiler to optimize things better.
You do realise that a std::string contains pointers and so on inside it? Furthermore, the small string optimisation (copying to the stack) means every data access to a std::string includes an extra branch.
> For small strings string view are just dumb and "most of the time" strings are very small.
So instead of just having a cheap reference into a string you're happy with the overhead of a function call (memcpy) and a pile of dynamic branches? I wouldn't be surprised if the branches are the major performance burden for std::string-based code that is processing a pile of substrings of some parent string. In this case, the data from the string_views will normally be in cache anyway (i.e. it will've been recently read by the function that decides who to slice into the string_view).
> To give a better example of why working a string view is both a bad idea and dangerous, it's as if you said "I don't want to copy this vector, therefore I will work on iterators". That's obviously a bad idea.
It's not obviously bad to me. In fact, it seems very reasonable to work with iterators rather than copying vectors (isn't that exactly what the algorithm header does?).
If your problem is that it is unsafe and hard to avoid dangling pointers etc, that's just a fundamental problem of C++ and is unavoidable in that language. One fix would be to use Rust; it handles iterators and string_views safely.
Sharing is only multithreading unfriendly if there's modification happening, and modification of textual (i.e. Unicode) data is bad practice and hard to get right, since all Unicode encodings are variable width (yes, even UTF-32, it is a variable width encoding of visible characters).
Furthermore, a string_view is strictly better than a string for many applications, since a string_view can always be copied into a string by the caller if necessary (i.e. each function can choose to return the most sensible/most performant thing, which is a string_view if it's just a substring of one of the arguments).
The only sensible argument against string_view in C++ I know is: it's easy to get dangling references. Which is correct, but that's a general problem with C++ itself, not with the idea of string views (Rust has a perfectly safe version in the form of &str, which cannot become dangling like in C++).
> Keep in mind that on a 64-bit architecture a view is at least 16 bytes large and that small strings can be copied to the stack resulting in better locality and reduced memory usage.
No, a string_view points into memory that already exists, there's no increased memory usage; a small string copied on to the stack will be part of the string struct, which is at least 3 * 8 = 24 bytes: a pointer, the length and the capacity. Also, a memcpy out of the original string is always going to be more expensive than just getting the pointer/length (or pair of pointers) for a string_view, since the memcpy has to do this anyway.
[1]: http://en.wikipedia.org/wiki/Copy_elision
[2]: http://definedbehavior.blogspot.com/2011/08/value-semantics-...