Commit Graph

35 Commits

Author SHA1 Message Date
Tomasz Sowa c0838de3a4
use a char32_t character in the base Stream class
Add an operator<<(char32_t) to the Stream class, char32_t will be used
as a main character instead of a wchar_t (this is needed on systems
where sizeof(wchar_t) is equal to 2).

while here:
- add to utf8:
  size_t wide_to_int(const Stream & stream, size_t stream_index, int & res, bool & correct)
  template<typename StreamType, typename OutputFunction> bool wide_to_output_function(StreamType & buffer, OutputFunction output_function, int mode = 1)
  template<typename OutputFunction> bool wide_to_output_function_by_index(const Stream & stream, OutputFunction output_function, int mode)
- add to convert/misc:
  bool try_esc_to_tex(char32_t c, pt::Stream & out)
  bool try_esc_to_html(char32_t c, pt::Stream & out)
2024-05-31 23:11:11 +02:00
Tomasz Sowa 450c5d55e9
leavy only utf8.h and utf8.cpp
Remove utf8_private.h, utf8_private.cpp and utf8_templates.h
and move their methods to utf8.h/utf8.cpp.
2024-05-30 21:20:25 +02:00
Tomasz Sowa aacb1f43ae
add some utf8 converting methods
add new methods:
- bool int_to_stream(int c, pt::Stream & stream);
- template<typename OutputFunction>
  bool utf8_to_output_function(const Stream & stream, OutputFunction output_function, int mode = 1);
- template<typename StreamIteratorType, typename OutputFunction>
  bool utf8_to_output_function(StreamIteratorType & iterator_in, const StreamIteratorType & iterator_end, OutputFunction output_function, int mode = 1);
- template<typename StreamType, typename OutputFunction>
  bool wide_to_output_function(StreamType & buffer, OutputFunction output_function, int mode = 1);

make some methods public:
- size_t wide_to_int(const wchar_t * wide_string, size_t string_len, int & z, bool & correct)
- size_t wide_to_int(const wchar_t * wide_string, int & z, bool & correct)

rename and make some methods public:
- template<typename OutputFunction>
  utf8_to_wide_generic(const char * utf8, size_t utf8_len, OutputFunction convert_function, int mode) -> utf8_to_output_function(...)

while here:
- fix: correctly convert characters in Log::put_multiline_generic()
2024-05-30 20:19:04 +02:00
Tomasz Sowa 5fd17175c1
add a TextStreamBase<>::operator<<(morm::Model & model) 2024-05-29 16:37:53 +02:00
Tomasz Sowa b3137a7607 rename functions for converting strings to integers to snake case
while here:
- add some functions taking std::string/std::wstring
2022-11-14 03:20:17 +01:00
Tomasz Sowa 663233fe2a let all utf8/wide functions can be available just by including utf8/utf8.h
while here:
- remove utf8/utf8_stream.h, now we only need utf8/utf8.h to include
- add some new methods for converting from a utf8 stream to wide stream/string
- do some improvements in TextStream:
  - don't use temporary objects to convert utf8/wide
  - add put_stream() which takes TextStreamBase<> as its argument
    (uses an iterator instead of get_char() for reading)
  - let operator<<(const Space & space) serialize to json and not to Space
2022-07-30 03:31:18 +02:00
Tomasz Sowa b81daf9fb6 set 2-Clause BSD licence in *.cpp files 2022-06-30 13:44:21 +02:00
Tomasz Sowa 74230d667b change headerfile_picotools_* macros to headerfile_pikotools_* 2022-06-30 12:45:08 +02:00
Tomasz Sowa cadba907b2 change licence from 3-Clause BSD to 2-Clause BSD 2022-06-30 12:09:22 +02:00
Tomasz Sowa 3173042229 make depend 2022-04-26 23:47:27 +02:00
Tomasz Sowa 6b97b1b74a fix: correctly escape json/xml/csv wide strings
A wide string was first changed to utf-8 and then escaped to json/xml/csv
which is incorrect. First should be escaped and then changed to utf-8.

Add TextStreamBase<>::iterator and TextStreamBase<>::const_interator as classes
with a method wchar_t get_unicode_and_advance(const iterator & end)
to return one character either from utf-8 stream or from wide stream.

Let TextStreamBase<>::operator<<(wchar_t v) correctly use utf-8.
2022-02-03 19:08:21 +01:00
Tomasz Sowa 17d2c0fb25 - added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
  template<typename StreamIteratorType>
  size_t utf8_to_int(
    StreamIteratorType & iterator_in,
    StreamIteratorType & iterator_end,
    int & res,
    bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa 5e4c7e9929 make depend 2021-10-02 21:01:19 +02:00
Tomasz Sowa 2cc9dd69a3 make depend 2021-08-12 21:53:52 +02:00
Tomasz Sowa 7ce07c57f5 added a base class for parsers: BaseParser (convert/baseparser.h|cpp)
there are methods for reading from string/files there
  those methods were moved from SpaceParser and CSVParser
fixed: CSVParser didn't set input_as_utf8 flag
2021-07-17 14:38:22 +02:00
Tomasz Sowa bdb2616f32 added: HTMLFilter (html/htmlfilter.h|cpp) - copied from winix project 2021-07-17 13:35:10 +02:00
Tomasz Sowa e31ef3c6c4 make depend 2021-06-27 22:34:05 +02:00
Tomasz Sowa 792057a869 make depend 2021-06-24 21:18:48 +02:00
Tomasz Sowa 3c0b59e115 added to Space: long double to Space::Value and methods for converting from/to long double
added global methods for converting float/string double/string and long double/string (convert/double.h|cpp):
      float to_float(const char * str, const char ** after = nullptr);
      float to_float(const wchar_t * str, const wchar_t ** after = nullptr);
      double to_double(const char * str, const char ** after = nullptr);
      double to_double(const wchar_t * str, const wchar_t ** after = nullptr);
      long double to_long_double(const char * str, const char ** after = nullptr);
      long double to_long_double(const wchar_t * str, const wchar_t ** after = nullptr);
      float to_float(const std::string & str, const char ** after = nullptr);
      float to_float(const std::wstring & str, const wchar_t ** after = nullptr);
      double to_double(const std::string & str, const char ** after = nullptr);
      double to_double(const std::wstring & str, const wchar_t ** after = nullptr);
      long double to_long_double(const std::string & str, const char ** after = nullptr);
      long double to_long_double(const std::wstring & str, const wchar_t ** after = nullptr);
      std::string to_str(float val);
      std::wstring to_wstr(float val);
      std::string to_str(double val);
      std::wstring to_wstr(double val);
      std::string to_str(long double val);
      std::wstring to_wstr(long double val);
2021-06-23 17:01:43 +02:00
Tomasz Sowa 865837d911 fixed in Space::find_child_space_const(...) - clang address sanitizer reports stack-use-after-scope
we have got a reference to a Space instead of a pointer and a local object was created and returned

==15076==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fffffffc7c0 at pc 0x000800a5d1bd bp 0x7fffffffc700 sp 0x7fffffffc6f8
READ of size 4 at 0x7fffffffc7c0 thread T0
    #0 0x800a5d1bc in pt::Space::is_object() const /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:778:9
    #1 0x800a67046 in pt::Space::get_object_field(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1519:6
    #2 0x800a6761c in pt::Space::get_table(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1582:18
    #3 0x800a694cb in pt::Space::find_child_space_table() /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1953:9
    #4 0x800855718 in Winix::TimeZone::SetTz(pt::Space&) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezone.cpp:316:45
    #5 0x80085b3a9 in Winix::TimeZones::ParseZones() /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:134:18
    #6 0x80085c04b in Winix::TimeZones::ReadTimeZones(wchar_t const*) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:176:3
    #7 0x80085c69f in Winix::TimeZones::ReadTimeZones(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t> > const&) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:199:9
    #8 0x80083c380 in Winix::System::ReadTimeZones() /usr/home/tomek/roboczy/prog/winix/winixd/core/system.cpp:122:13
    #9 0x80083ca19 in Winix::System::Init() /usr/home/tomek/roboczy/prog/winix/winixd/core/system.cpp:172:2
    #10 0x80069ce41 in Winix::App::Init() /usr/home/tomek/roboczy/prog/winix/winixd/core/app.cpp:355:9
    #11 0x2de92e in main /usr/home/tomek/roboczy/prog/winix/winixd/main/main.cpp:206:11

Address 0x7fffffffc7c0 is located in stack of thread T0 at offset 128 in frame
    #0 0x800a66f3f in pt::Space::get_object_field(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1518

  This frame has 3 object(s):
    [32, 40) 'i' (line 1521)
    [64, 88) 'ref.tmp' (line 1521)
    [128, 136) 'ref.tmp4' (line 1523) <== Memory access at offset 128 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:778:9 in pt::Space::is_object() const
Shadow bytes around the buggy address:
  0x4ffffffff8a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8e0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2
=>0x4ffffffff8f0: f8 f8 f8 f2 f2 f2 f2 f2[f8]f3 f3 f3 00 00 00 00
  0x4ffffffff900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff920: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2
  0x4ffffffff930: f8 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==15076==ABORTING
2021-06-18 18:52:24 +02:00
Tomasz Sowa 6d2503ae0e make depend 2021-06-16 23:44:21 +02:00
Tomasz Sowa 8b0ed5e750 added to TextStream:
TextStreamBase & operator<<(unsigned char);
  TextStreamBase & operator<<(bool);
  TextStreamBase & operator<<(short);
  TextStreamBase & operator<<(unsigned short);
  TextStreamBase & operator<<(float);
  TextStreamBase & operator<<(long double);
2021-06-15 19:54:50 +02:00
Tomasz Sowa 4d70ae9e87 fixed: using size() when serializing strings - this allows to serialize a string which contain a null character
fixed: printing null character in space format: \u0000 (before was \0 which is not correct in json)
fixed: in serialize_string_buffer(const char * input_str, ...) a temporary fixed was used when copying input string
added support for surrogate pairs when reading \uHHHH format
added support to parse \u{H...} format (only if parsing Space format)
2021-06-14 13:48:32 +02:00
Tomasz Sowa 59d4c9a9c8 changed utf8 functions: PascalCase to snake_case 2021-05-21 00:24:56 +02:00
Tomasz Sowa b574289054 namespace PT renamed to pt 2021-05-20 16:11:12 +02:00
Tomasz Sowa 430822bad8 make depend 2021-05-19 03:26:57 +02:00
Tomasz Sowa 0ea5497094 added CSVParser - a csv parser 2021-05-19 03:26:46 +02:00
Tomasz Sowa db93586c0e make depend 2021-05-18 23:58:17 +02:00
Tomasz Sowa ad4e8078ae MainSpaceParser class has been renamed to MainOptionsParser 2021-05-18 23:57:58 +02:00
Tomasz Sowa 96e60c526f moved files: mainspaceparser/mainspaceparser.(h|cpp) -> mainoptions/mainoptionsparser.(h|cpp) 2021-05-18 23:50:42 +02:00
Tomasz Sowa a5c8833452 added tests for MainSpaceParser 2021-05-18 22:57:26 +02:00
Tomasz Sowa 91300bb245 make depend 2021-05-17 03:21:00 +02:00
Tomasz Sowa fe82f63efb changed the way of building in Makefiles 2021-05-17 03:20:51 +02:00
Tomasz Sowa da6a36a205 start creating tests for MainSpaceParser 2021-05-17 03:19:47 +02:00
Tomasz Sowa ce81670bb6 added 'tests' directory with tests for the pikotools library
currently only tests for convert/text functions
2021-05-10 20:08:50 +02:00