File indexing completed on 2024-05-19 07:34:39
0001 /**************************************************************************** 0002 ** MIT License 0003 ** 0004 ** Copyright (C) 2020-2022 Klarälvdalens Datakonsult AB, a KDAB Group company, info@kdab.com, author Marc Mutz <marc.mutz@kdab.com> 0005 ** 0006 ** This file is part of KDToolBox (https://github.com/KDAB/KDToolBox). 0007 ** 0008 ** Permission is hereby granted, free of charge, to any person obtaining a copy 0009 ** of this software and associated documentation files (the "Software"), to deal 0010 ** in the Software without restriction, including without limitation the rights 0011 ** to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 0012 ** copies of the Software, ** and to permit persons to whom the Software is 0013 ** furnished to do so, subject to the following conditions: 0014 ** 0015 ** The above copyright notice and this permission notice (including the next paragraph) 0016 ** shall be included in all copies or substantial portions of the Software. 0017 ** 0018 ** THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 0019 ** IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 0020 ** FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 0021 ** AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 0022 ** LIABILITY, WHETHER IN AN ACTION OF ** CONTRACT, TORT OR OTHERWISE, 0023 ** ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 0024 ** DEALINGS IN THE SOFTWARE. 0025 ****************************************************************************/ 0026 0027 #include "qstringtokenizer.h" 0028 #include "qstringalgorithms.h" 0029 0030 /*! 0031 \class QStringTokenizer 0032 \brief The QStringTokenizer class splits strings into tokens along given separators 0033 \reentrant 0034 0035 Splits a string into substrings wherever a given separator occurs, 0036 and returns a (lazy) list of those strings. If the separator does 0037 not match anywhere in the string, produces a single-element 0038 containing this string. If the separator is empty, 0039 QStringTokenizer produces an empty string, followed by each of the 0040 string's characters, followed by another empty string. The two 0041 enumerations Qt::SplitBehavior and Qt::CaseSensitivity further 0042 control the output. 0043 0044 QStringTokenizer drives QStringView::tokenize(), but, at least with a 0045 recent compiler, you can use it directly, too: 0046 0047 \code 0048 for (auto it : QStringTokenizer{string, separator}) 0049 use(*it); 0050 \endcode 0051 0052 \note You should never, ever, name the template arguments of a 0053 QStringTokenizer explicitly. If you can use C++17 Class Template 0054 Argument Deduction (CTAD), you may write 0055 \c{QStringTokenizer{string, separator}} (without template 0056 arguments). If you can't use C++17 CTAD, you must use the 0057 QStringView::split() or QLatin1String::split() member functions 0058 and store the return value only in \c{auto} variables: 0059 0060 \code 0061 auto result = string.split(sep); 0062 \endcode 0063 0064 This is because the template arguments of QStringTokenizer have a 0065 very subtle dependency on the specific string and separator types 0066 from with which they are constructed, and they don't usually 0067 correspond to the actual types passed. 0068 0069 \section Lazy Sequences 0070 0071 QStringTokenizer acts as a so-called lazy sequence, that is, each 0072 next element is only computed once you ask for it. Lazy sequences 0073 have the advantage that they only require O(1) memory. They have 0074 the disadvantage that, at least for QStringTokenizer, they only 0075 allow forward, not random-access, iteration. 0076 0077 The intended use-case is that you just plug it into a ranged for loop: 0078 0079 \code 0080 for (auto it : QStringTokenizer{string, separator}) 0081 use(*it); 0082 \endcode 0083 0084 or a C++20 ranged algorithm: 0085 0086 \code 0087 std::ranges::for_each(QStringTokenizer{string, separator}, 0088 [] (auto token) { use(token); }); 0089 \endcode 0090 0091 \section End Sentinel 0092 0093 The QStringTokenizer iterators cannot be used with classical STL 0094 algorithms, because those require iterator/iterator pairs, while 0095 QStringTokenizer uses sentinels, that is, it uses a different 0096 type, QStringTokenizer::sentinel, to mark the end of the 0097 range. This improves performance, because the sentinel is an empty 0098 type. Sentinels are supported from C++17 (for ranged for) 0099 and C++20 (for algorithms using the new ranges library). 0100 0101 QStringTokenizer falls back to a non-sentinel end iterator 0102 implementation if the compiler doesn't support separate types for 0103 begin and end iterators in ranged for loops 0104 (\link{https://wg21.link/P0184}{P1084}), in which case traditional 0105 STL algorthms will \em appear to be supported, but as you migrate 0106 to a compiler that supports P0184, such code will break. We 0107 recommend to use only the C++20 \c{std::ranges} algorithms, or, if 0108 you're stuck on C++14/17 for the time being, 0109 \link{https://github.com/ericniebler/range-v3}{Eric Niebler's 0110 Ranges v3 library}, which has the same semantics as the C++20 0111 \c{std::ranges} library. 0112 0113 \section Temporaries 0114 0115 QStringTokenizer is very carefully designed to avoid dangling 0116 references. If you construct a tokenizer from a temporary string 0117 (an rvalue), that argument is stored internally, so the referenced 0118 data isn't deleted before it is tokenized: 0119 0120 \code 0121 auto tok = QStringTokenizer{widget.text(), u','}; 0122 // return value of `widget.text()` is destroyed, but content was moved into `tok` 0123 for (auto e : tok) 0124 use(e); 0125 \endcode 0126 0127 If you pass named objects (lvalues), then QStringTokenizer does 0128 not store a copy. You are reponsible to keep the named object's 0129 data around for longer than the tokenizer operates on it: 0130 0131 \code 0132 auto text = widget.text(); 0133 auto tok = QStringTokenizer{text, u','}; 0134 text.clear(); // destroy content of `text` 0135 for (auto e : tok) // ERROR: `tok` references deleted data! 0136 use(e); 0137 \endcode 0138 0139 \sa QStringView::split(), QLatin1Sting::split(), Qt::SplitBehavior, Qt::CaseSensitivity 0140 */ 0141 0142 /*! 0143 \typedef QStringTokenizer::value_type 0144 0145 Alias for \c{const QStringView} or \c{const QLatin1String}, 0146 depending on the tokenizer's \c Haystack template argument. 0147 */ 0148 0149 /*! 0150 \typedef QStringTokenizer::difference_type 0151 0152 Alias for qsizetype. 0153 */ 0154 0155 /*! 0156 \typedef QStringTokenizer::size_type 0157 0158 Alias for qsizetype. 0159 */ 0160 0161 /*! 0162 \typedef QStringTokenizer::reference 0163 0164 Alias for \c{value_type &}. 0165 0166 QStringTokenizer does not support mutable references, so this is 0167 the same as const_reference. 0168 */ 0169 0170 /*! 0171 \typedef QStringTokenizer::const_reference 0172 0173 Alias for \c{value_type &}. 0174 */ 0175 0176 /*! 0177 \typedef QStringTokenizer::pointer 0178 0179 Alias for \c{value_type *}. 0180 0181 QStringTokenizer does not support mutable iterators, so this is 0182 the same as const_pointer. 0183 */ 0184 0185 /*! 0186 \typedef QStringTokenizer::const_pointer 0187 0188 Alias for \c{value_type *}. 0189 */ 0190 0191 /*! 0192 \typedef QStringTokenizer::iterator 0193 0194 This typedef provides an STL-style const iterator for 0195 QStringTokenizer. 0196 0197 QStringTokenizer does not support mutable iterators, so this is 0198 the same as const_iterator. 0199 0200 \sa const_iterator 0201 */ 0202 0203 /*! 0204 \typedef QStringTokenizer::const_iterator 0205 0206 This typedef provides an STL-style const iterator for 0207 QStringTokenizer. 0208 0209 \sa iterator 0210 */ 0211 0212 /*! 0213 \typedef QStringTokenizer::sentinel 0214 0215 This typedef provides an STL-style sentinel for 0216 QStringTokenizer::iterator and QStringTokenizer::const_iterator. 0217 0218 \sa const_iterator 0219 */ 0220 0221 /*! 0222 \fn QStringTokenizer(Haystack haystack, String needle, Qt::CaseSensitivity cs, Qt::SplitBehavior sb) 0223 \fn QStringTokenizer(Haystack haystack, String needle, Qt::SplitBehavior sb, Qt::CaseSensitivity cs) 0224 0225 Constructs a string tokenizer that splits the string \a haystack 0226 into substrings wherever \a needle occurs, and allows iteration 0227 over those strings as they are found. If \a needle does not match 0228 anywhere in \a haystack, a single element containing \a haystack 0229 is produced. 0230 0231 \a cs specifies whether \a needle should be matched case 0232 sensitively or case insensitively. 0233 0234 If \a sb is QString::SkipEmptyParts, empty entries don't 0235 appear in the result. By default, empty entries are included. 0236 0237 \sa QStringView::split(), QLatin1String::split(), Qt::CaseSensitivity, Qt::SplitBehavior 0238 */ 0239 0240 /*! 0241 \fn QStringTokenizer::const_iterator QStringTokenizer::begin() const 0242 0243 Returns a const \l{STL-style iterators}{STL-style iterator} 0244 pointing to the first token in the list. 0245 0246 \sa end(), cbegin() 0247 */ 0248 0249 /*! 0250 \fn QStringTokenizer::const_iterator QStringTokenizer::cbegin() const 0251 0252 Same as begin(). 0253 0254 \sa cend(), begin() 0255 */ 0256 0257 /*! 0258 \fn QStringTokenizer::sentinel QStringTokenizer::end() const 0259 0260 Returns a const \l{STL-style iterators}{STL-style sentinel} 0261 pointing to the imaginary token after the last token in the list. 0262 0263 \sa begin(), cend() 0264 */ 0265 0266 /*! 0267 \fn QStringTokenizer::sentinel QStringTokenizer::cend() const 0268 0269 Same as end(). 0270 0271 \sa cbegin(), end() 0272 */ 0273 0274 /*! 0275 \fn QStringTokenizer::toContainer(Container &&c) const & 0276 0277 Convenience method to convert the lazy sequence into a 0278 (typically) random-access container. 0279 0280 This function is only available if \c Container has a \c value_type 0281 matching this tokenizer's value_type. 0282 0283 If you pass in a named container (an lvalue), then that container 0284 is filled, and a reference to it is returned. 0285 0286 If you pass in a temporary container (an rvalue, incl. the default 0287 argument), then that container is filled, and returned by value. 0288 0289 \code 0290 // assuming tok's value_type is QStringView, then... 0291 auto tok = QStringTokenizer{~~~}; 0292 // ... rac1 is a QVector: 0293 auto rac1 = tok.toContainer(); 0294 // ... rac2 is std::pmr::vector<QStringView>: 0295 auto rac2 = tok.toContainer<std::pmr::vector<QStringView>>(); 0296 auto rac3 = QVarLengthArray<QStringView, 12>{}; 0297 // appends the token sequence produced by tok to rac3 0298 // and returns a reference to rac3 (which we ignore here): 0299 tok.toContainer(rac3); 0300 \endcode 0301 0302 This gives you maximum flexibility in how you want the sequence to 0303 be stored. 0304 */ 0305 0306 /*! 0307 \fn QStringTokenizer::toContainer(Container &&c) const && 0308 \overload 0309 0310 In addition to the constraints on the lvalue-this overload, this 0311 rvalue-this overload is only available when this QStringTokenizer 0312 does not store the haystack internally, as this could create a 0313 container full of dangling references: 0314 0315 \code 0316 auto tokens = QStringTokenizer{widget.text(), u','}.toContainer(); 0317 // ERROR: cannot call toContainer() on rvalue 0318 // 'tokens' references the data of the copy of widget.text() 0319 // stored inside the QStringTokenizer, which has since been deleted 0320 \endcode 0321 0322 To fix, store the QStringTokenizer in a temporary: 0323 0324 \code 0325 auto tokenizer = QStringTokenizer{widget.text90, u','}; 0326 auto tokens = tokenizer.toContainer(); 0327 // OK: the copy of widget.text() stored in 'tokenizer' keeps the data 0328 // referenced by 'tokens' alive. 0329 \endcode 0330 0331 You can force this function into existence by passing a view instead: 0332 0333 \code 0334 func(QStringTokenizer{QStringView{widget.text()}, u','}.toContainer()); 0335 // OK: compiler keeps widget.text() around until after func() has executed 0336 \endcode 0337 */ 0338 0339 /*! 0340 \fn qTokenize(Haystack &&haystack, Needle &&needle, Flags...flags) 0341 \relates QStringTokenizer 0342 0343 Factory function for QStringTokenizer. You can use this function 0344 if your compiler doesn't, yet, support C++17 Class Template 0345 Argument Deduction (CTAD), but we recommend direct use of 0346 QStringTokenizer with CTAD instead. 0347 */