Commit Graph

16 Commits

Author SHA1 Message Date
Tomasz Sowa 78d31861de
add some wide/utf8 convertion methods
add following methods:
size_t int_to_wide(int c, wchar_t * res, size_t max_buf_len);

template<typename StreamIteratorType>
bool utf8_to_wide(StreamIteratorType & iterator_in, const StreamIteratorType & iterator_end, wchar_t * out_buffer, size_t max_buffer_len, int mode = 1, bool * was_buffer_sufficient_large = nullptr);

template<typename StreamType>
bool utf8_to_wide(const StreamType & stream, wchar_t * out_buffer, size_t max_buffer_len, bool * was_buffer_sufficient_large = nullptr, int mode = 1);

template<typename StreamType>
bool wide_stream_to_utf8(StreamType & buffer, char * utf8, std::size_t max_buffer_size, bool * was_buffer_sufficient_large = nullptr, int mode = 1);
2023-07-14 07:41:14 +02:00
Tomasz Sowa 7e92b5d9d7
add HTMLParser::parse_xml(...) methods 2023-07-04 22:58:43 +02:00
Tomasz Sowa 663233fe2a let all utf8/wide functions can be available just by including utf8/utf8.h
while here:
- remove utf8/utf8_stream.h, now we only need utf8/utf8.h to include
- add some new methods for converting from a utf8 stream to wide stream/string
- do some improvements in TextStream:
  - don't use temporary objects to convert utf8/wide
  - add put_stream() which takes TextStreamBase<> as its argument
    (uses an iterator instead of get_char() for reading)
  - let operator<<(const Space & space) serialize to json and not to Space
2022-07-30 03:31:18 +02:00
Tomasz Sowa b81daf9fb6 set 2-Clause BSD licence in *.cpp files 2022-06-30 13:44:21 +02:00
Tomasz Sowa 74230d667b change headerfile_picotools_* macros to headerfile_pikotools_* 2022-06-30 12:45:08 +02:00
Tomasz Sowa cadba907b2 change licence from 3-Clause BSD to 2-Clause BSD 2022-06-30 12:09:22 +02:00
Tomasz Sowa 6b97b1b74a fix: correctly escape json/xml/csv wide strings
A wide string was first changed to utf-8 and then escaped to json/xml/csv
which is incorrect. First should be escaped and then changed to utf-8.

Add TextStreamBase<>::iterator and TextStreamBase<>::const_interator as classes
with a method wchar_t get_unicode_and_advance(const iterator & end)
to return one character either from utf-8 stream or from wide stream.

Let TextStreamBase<>::operator<<(wchar_t v) correctly use utf-8.
2022-02-03 19:08:21 +01:00
Tomasz Sowa 17d2c0fb25 - added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
  template<typename StreamIteratorType>
  size_t utf8_to_int(
    StreamIteratorType & iterator_in,
    StreamIteratorType & iterator_end,
    int & res,
    bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa 8ec9350d52 added two functions to utf8:
template<typename StreamType> bool utf8_to_wide(const Stream & stream, StreamType & res, bool clear = true, int mode = 1);
template<typename StreamType> bool wide_stream_to_utf8(const Stream & stream, StreamType & utf8, bool clear = true, int mode = 1);

these functions are moved from TextStreamBase
2021-06-25 19:10:01 +02:00
Tomasz Sowa 4d9f5f6c55 Log class has the Stream class as a base class now
- implemented some missing operators<<(...)
- removed Manipulators: l1, l2, l3, l4, lend, lsave
- PascalCase to snake_case in Log

added to Stream:
  virtual bool is_char_stream() const = 0;
  virtual bool is_wchar_stream() const = 0;
  virtual char get_char(size_t index) const = 0;
  virtual wchar_t get_wchar(size_t index) const = 0;
  virtual Stream & operator<<(const Stream & stream) = 0;
2021-06-24 20:52:48 +02:00
Tomasz Sowa 819c49e638 added class Stream (textstream/stream.h) which acts as a base class for TextStream
TextStream is making conversions wide/utf8 now
2021-06-20 14:13:23 +02:00
Tomasz Sowa 8b0ed5e750 added to TextStream:
TextStreamBase & operator<<(unsigned char);
  TextStreamBase & operator<<(bool);
  TextStreamBase & operator<<(short);
  TextStreamBase & operator<<(unsigned short);
  TextStreamBase & operator<<(float);
  TextStreamBase & operator<<(long double);
2021-06-15 19:54:50 +02:00
Tomasz Sowa 4d70ae9e87 fixed: using size() when serializing strings - this allows to serialize a string which contain a null character
fixed: printing null character in space format: \u0000 (before was \0 which is not correct in json)
fixed: in serialize_string_buffer(const char * input_str, ...) a temporary fixed was used when copying input string
added support for surrogate pairs when reading \uHHHH format
added support to parse \u{H...} format (only if parsing Space format)
2021-06-14 13:48:32 +02:00
Tomasz Sowa 59d4c9a9c8 changed utf8 functions: PascalCase to snake_case 2021-05-21 00:24:56 +02:00
Tomasz Sowa b574289054 namespace PT renamed to pt 2021-05-20 16:11:12 +02:00
Tomasz Sowa 3984c29fbf moved all directories to src subdirectory 2021-05-09 20:11:37 +02:00