Tomasz Sowa
78d31861de
add some wide/utf8 convertion methods
...
add following methods:
size_t int_to_wide(int c, wchar_t * res, size_t max_buf_len);
template<typename StreamIteratorType>
bool utf8_to_wide(StreamIteratorType & iterator_in, const StreamIteratorType & iterator_end, wchar_t * out_buffer, size_t max_buffer_len, int mode = 1, bool * was_buffer_sufficient_large = nullptr);
template<typename StreamType>
bool utf8_to_wide(const StreamType & stream, wchar_t * out_buffer, size_t max_buffer_len, bool * was_buffer_sufficient_large = nullptr, int mode = 1);
template<typename StreamType>
bool wide_stream_to_utf8(StreamType & buffer, char * utf8, std::size_t max_buffer_size, bool * was_buffer_sufficient_large = nullptr, int mode = 1);
2023-07-14 07:41:14 +02:00
Tomasz Sowa
7e92b5d9d7
add HTMLParser::parse_xml(...) methods
2023-07-04 22:58:43 +02:00
Tomasz Sowa
663233fe2a
let all utf8/wide functions can be available just by including utf8/utf8.h
...
while here:
- remove utf8/utf8_stream.h, now we only need utf8/utf8.h to include
- add some new methods for converting from a utf8 stream to wide stream/string
- do some improvements in TextStream:
- don't use temporary objects to convert utf8/wide
- add put_stream() which takes TextStreamBase<> as its argument
(uses an iterator instead of get_char() for reading)
- let operator<<(const Space & space) serialize to json and not to Space
2022-07-30 03:31:18 +02:00
Tomasz Sowa
b81daf9fb6
set 2-Clause BSD licence in *.cpp files
2022-06-30 13:44:21 +02:00
Tomasz Sowa
74230d667b
change headerfile_picotools_* macros to headerfile_pikotools_*
2022-06-30 12:45:08 +02:00
Tomasz Sowa
cadba907b2
change licence from 3-Clause BSD to 2-Clause BSD
2022-06-30 12:09:22 +02:00
Tomasz Sowa
6b97b1b74a
fix: correctly escape json/xml/csv wide strings
...
A wide string was first changed to utf-8 and then escaped to json/xml/csv
which is incorrect. First should be escaped and then changed to utf-8.
Add TextStreamBase<>::iterator and TextStreamBase<>::const_interator as classes
with a method wchar_t get_unicode_and_advance(const iterator & end)
to return one character either from utf-8 stream or from wide stream.
Let TextStreamBase<>::operator<<(wchar_t v) correctly use utf-8.
2022-02-03 19:08:21 +01:00
Tomasz Sowa
17d2c0fb25
- added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
...
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
template<typename StreamIteratorType>
size_t utf8_to_int(
StreamIteratorType & iterator_in,
StreamIteratorType & iterator_end,
int & res,
bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa
8ec9350d52
added two functions to utf8:
...
template<typename StreamType> bool utf8_to_wide(const Stream & stream, StreamType & res, bool clear = true, int mode = 1);
template<typename StreamType> bool wide_stream_to_utf8(const Stream & stream, StreamType & utf8, bool clear = true, int mode = 1);
these functions are moved from TextStreamBase
2021-06-25 19:10:01 +02:00
Tomasz Sowa
4d9f5f6c55
Log class has the Stream class as a base class now
...
- implemented some missing operators<<(...)
- removed Manipulators: l1, l2, l3, l4, lend, lsave
- PascalCase to snake_case in Log
added to Stream:
virtual bool is_char_stream() const = 0;
virtual bool is_wchar_stream() const = 0;
virtual char get_char(size_t index) const = 0;
virtual wchar_t get_wchar(size_t index) const = 0;
virtual Stream & operator<<(const Stream & stream) = 0;
2021-06-24 20:52:48 +02:00
Tomasz Sowa
819c49e638
added class Stream (textstream/stream.h) which acts as a base class for TextStream
...
TextStream is making conversions wide/utf8 now
2021-06-20 14:13:23 +02:00
Tomasz Sowa
8b0ed5e750
added to TextStream:
...
TextStreamBase & operator<<(unsigned char);
TextStreamBase & operator<<(bool);
TextStreamBase & operator<<(short);
TextStreamBase & operator<<(unsigned short);
TextStreamBase & operator<<(float);
TextStreamBase & operator<<(long double);
2021-06-15 19:54:50 +02:00
Tomasz Sowa
4d70ae9e87
fixed: using size() when serializing strings - this allows to serialize a string which contain a null character
...
fixed: printing null character in space format: \u0000 (before was \0 which is not correct in json)
fixed: in serialize_string_buffer(const char * input_str, ...) a temporary fixed was used when copying input string
added support for surrogate pairs when reading \uHHHH format
added support to parse \u{H...} format (only if parsing Space format)
2021-06-14 13:48:32 +02:00
Tomasz Sowa
59d4c9a9c8
changed utf8 functions: PascalCase to snake_case
2021-05-21 00:24:56 +02:00
Tomasz Sowa
b574289054
namespace PT renamed to pt
2021-05-20 16:11:12 +02:00
Tomasz Sowa
3984c29fbf
moved all directories to src subdirectory
2021-05-09 20:11:37 +02:00