Commit Graph

203 Commits

Author SHA1 Message Date
Tomasz Sowa b781948f21 HTMLParser now parses correctly such entities: & < > " ' 2021-12-02 17:44:41 +01:00
Tomasz Sowa 2dadfc0809 added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser 2021-11-30 16:27:27 +01:00
Tomasz Sowa bb9205a55e added: Space::Space(const Date & date), Space::set(const Date & date), Space::add(const Date & date), Space::add(const wchar_t * field, const Date & date) 2021-11-05 09:27:32 +01:00
Tomasz Sowa 5eff9a5f4f Space::to_bool() return true now when a string/object or table is non empty 2021-10-20 08:30:57 +02:00
Tomasz Sowa c54c398828 fixed in HTMLParser: </nofilter> tag was printed 2021-10-13 00:40:55 +02:00
Tomasz Sowa 17d2c0fb25 - added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
  template<typename StreamIteratorType>
  size_t utf8_to_int(
    StreamIteratorType & iterator_in,
    StreamIteratorType & iterator_end,
    int & res,
    bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa 4902eb6037 fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2 2021-10-03 13:22:49 +02:00
Tomasz Sowa 5e4c7e9929 make depend 2021-10-02 21:01:19 +02:00
Tomasz Sowa abe349be34 small refactoring in HTMLParser 2021-10-02 21:01:09 +02:00
Tomasz Sowa f23cabfb2f added to HTMLParser: filter_file(...) methods for filtering from a file 2021-10-02 20:34:19 +02:00
Tomasz Sowa 5b2583b566 fixed in HTMLParser: sometimes a closing item left on the stack, for stack_len < 3 there was not PopStack() called 2021-10-02 18:45:02 +02:00
Tomasz Sowa 2cc9dd69a3 make depend 2021-08-12 21:53:52 +02:00
Tomasz Sowa 2576eb12d1 HTMLParser: start working on xml mode
added methods:
Status parse_xml_file(const char * file_name,         Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::string & file_name,  Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const wchar_t * file_name,      Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::wstring & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
2021-08-10 21:56:04 +02:00
Tomasz Sowa b1cc64a29b added a compact_mode option when creating a space output 2021-08-10 01:45:10 +02:00
Tomasz Sowa b8a03bf852 HTMLParser: added possibility to parse html to Space class
added method: HTMLParser::parse_html(const wchar_t * in, Space & space)
2021-08-07 21:21:16 +02:00
Tomasz Sowa 7fcfdac52f Space: added pretty_print parameter to some json serializing methods 2021-08-07 21:19:38 +02:00
Tomasz Sowa 8c5ede5cf3 HTMLParser: for <script> and <!- (comments) we copy the content without parsing 2021-08-07 02:13:13 +02:00
Tomasz Sowa fdfd0b1385 renamed: HTMLFilter -> HTMLParser 2021-08-06 17:10:19 +02:00
Tomasz Sowa f6df8bc1bc HTMLFilter: added a std::vector<int> stack for a current white mode - white chars mode can be changed by such tags: <textarea>, <pre>, <script>, <nofilter> 2021-07-21 15:57:46 +02:00
Tomasz Sowa c0e940c500 fixed improper new line character after <single/> items, added Item::new_line_before flag 2021-07-21 11:30:49 +02:00
Tomasz Sowa 4f8ae6ce29 some work in HTMLFilter
- instead of directly using pchar pointer now we use pointers/streams from BaseParser
- removed support for putting a white char in long words: removed BreakWord(size_t break_after_) method
- changed the way how white characters are treated: added white_chars_mode(int mode) method
  mode 0: WHITE_MODE_ORIGIN
  mode 1: WHITE_MODE_SINGLE_LINE
  mode 2: WHITE_MODE_TREE
2021-07-20 20:48:01 +02:00
Tomasz Sowa 7ce07c57f5 added a base class for parsers: BaseParser (convert/baseparser.h|cpp)
there are methods for reading from string/files there
  those methods were moved from SpaceParser and CSVParser
fixed: CSVParser didn't set input_as_utf8 flag
2021-07-17 14:38:22 +02:00
Tomasz Sowa 2a3f43c5c3 added BBCODEParser (html/bbcodeparser.h|cpp) - copied from winix project 2021-07-17 13:54:03 +02:00
Tomasz Sowa bdb2616f32 added: HTMLFilter (html/htmlfilter.h|cpp) - copied from winix project 2021-07-17 13:35:10 +02:00
Tomasz Sowa 1e5598cde1 added to Date: SerializeMonthAsRoman(Stream & out, int month) - serialize month in Roman numerals
added a param: 'bool roman_month' to some serialize methods
2021-07-06 21:44:04 +02:00
Tomasz Sowa 198945c97b PatternReplacerBase: to_string() changed to to_str() 2021-07-06 21:42:42 +02:00
Tomasz Sowa 34f1fc04cf added Space::remove(size_t table_index) for removing a table item
fixed: pretty printing for Space format
2021-06-29 23:25:31 +02:00
Tomasz Sowa 8997284b16 added trim(...) functions to convert/text.h
void trim_first_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_first_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);

void trim_last_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_last_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);

void trim_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);

void trim_first(std::string & str, wchar_t c);
void trim_first(std::wstring & str, wchar_t c);

void trim_last(std::string & str, wchar_t c);
void trim_last(std::wstring & str, wchar_t c);

void trim(std::string & str, wchar_t c);
void trim(std::wstring & str, wchar_t c);
2021-06-29 23:23:35 +02:00
Tomasz Sowa e31ef3c6c4 make depend 2021-06-27 22:34:05 +02:00
Tomasz Sowa e0d6e7fcb1 added to Space:
Space & get_add_space(const wchar_t * field);
Space & get_add_space(const std::wstring & field);
2021-06-27 15:58:53 +02:00
Tomasz Sowa 009e240a8d fixed some memory leaks in Space, pointers in tables and objects were not correctly 'deleted', affected methods:
set_empty_table()
set_empty_object()
add(const wchar_t * field, Space && space)
copy_value_object(const Value & value_from)
copy_value_table(const Value & value_from)
initialize_value_object_if_needed(ObjectType && obj)
initialize_value_table_if_needed(TableType && tab)
add_generic(const wchar_t * field, const ArgType & val)
2021-06-27 15:41:38 +02:00
Tomasz Sowa 4a1630b1ea removed support for so called child objects from Space (this was an old feature of Space struct, now not needed)
Space::get_object_field(...) renamed to Space::get_space(...)
2021-06-26 22:56:12 +02:00
Tomasz Sowa 8ec9350d52 added two functions to utf8:
template<typename StreamType> bool utf8_to_wide(const Stream & stream, StreamType & res, bool clear = true, int mode = 1);
template<typename StreamType> bool wide_stream_to_utf8(const Stream & stream, StreamType & utf8, bool clear = true, int mode = 1);

these functions are moved from TextStreamBase
2021-06-25 19:10:01 +02:00
Tomasz Sowa 792057a869 make depend 2021-06-24 21:18:48 +02:00
Tomasz Sowa 4d9f5f6c55 Log class has the Stream class as a base class now
- implemented some missing operators<<(...)
- removed Manipulators: l1, l2, l3, l4, lend, lsave
- PascalCase to snake_case in Log

added to Stream:
  virtual bool is_char_stream() const = 0;
  virtual bool is_wchar_stream() const = 0;
  virtual char get_char(size_t index) const = 0;
  virtual wchar_t get_wchar(size_t index) const = 0;
  virtual Stream & operator<<(const Stream & stream) = 0;
2021-06-24 20:52:48 +02:00
Tomasz Sowa 2b6789754f implemented pretty printing in Space::serialize_to_json_stream(StreamType & str, bool pretty_print, int level) 2021-06-23 21:54:34 +02:00
Tomasz Sowa 3c0b59e115 added to Space: long double to Space::Value and methods for converting from/to long double
added global methods for converting float/string double/string and long double/string (convert/double.h|cpp):
      float to_float(const char * str, const char ** after = nullptr);
      float to_float(const wchar_t * str, const wchar_t ** after = nullptr);
      double to_double(const char * str, const char ** after = nullptr);
      double to_double(const wchar_t * str, const wchar_t ** after = nullptr);
      long double to_long_double(const char * str, const char ** after = nullptr);
      long double to_long_double(const wchar_t * str, const wchar_t ** after = nullptr);
      float to_float(const std::string & str, const char ** after = nullptr);
      float to_float(const std::wstring & str, const wchar_t ** after = nullptr);
      double to_double(const std::string & str, const char ** after = nullptr);
      double to_double(const std::wstring & str, const wchar_t ** after = nullptr);
      long double to_long_double(const std::string & str, const char ** after = nullptr);
      long double to_long_double(const std::wstring & str, const wchar_t ** after = nullptr);
      std::string to_str(float val);
      std::wstring to_wstr(float val);
      std::string to_str(double val);
      std::wstring to_wstr(double val);
      std::string to_str(long double val);
      std::wstring to_wstr(long double val);
2021-06-23 17:01:43 +02:00
Tomasz Sowa c1f1dc96df added Space::serialize_to_string(StreamType & stream) template 2021-06-22 17:52:55 +02:00
Tomasz Sowa 99fbdc1635 in Log::~Log(): removed call to save_log_and_clear()
it creates a problem if a buffer is destroyed first:
2021-06-20 18:19:53 +02:00
Tomasz Sowa 4a2a99a77d removed a comment 2021-06-20 17:39:37 +02:00
Tomasz Sowa 0865c41e48 make depend 2021-06-20 16:47:12 +02:00
Tomasz Sowa ac407b2362 macro renamed: PT_HAS_MORM -> PT_HAS_MORM_LIBRARY
TextStream::to_string(...) is now TextStream::to_str(...)
added: std::string TextStream::to_str() const;
added: std::wstring TextStream::to_wstr() const;
2021-06-20 16:46:08 +02:00
Tomasz Sowa 819c49e638 added class Stream (textstream/stream.h) which acts as a base class for TextStream
TextStream is making conversions wide/utf8 now
2021-06-20 14:13:23 +02:00
Tomasz Sowa 865837d911 fixed in Space::find_child_space_const(...) - clang address sanitizer reports stack-use-after-scope
we have got a reference to a Space instead of a pointer and a local object was created and returned

==15076==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fffffffc7c0 at pc 0x000800a5d1bd bp 0x7fffffffc700 sp 0x7fffffffc6f8
READ of size 4 at 0x7fffffffc7c0 thread T0
    #0 0x800a5d1bc in pt::Space::is_object() const /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:778:9
    #1 0x800a67046 in pt::Space::get_object_field(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1519:6
    #2 0x800a6761c in pt::Space::get_table(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1582:18
    #3 0x800a694cb in pt::Space::find_child_space_table() /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1953:9
    #4 0x800855718 in Winix::TimeZone::SetTz(pt::Space&) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezone.cpp:316:45
    #5 0x80085b3a9 in Winix::TimeZones::ParseZones() /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:134:18
    #6 0x80085c04b in Winix::TimeZones::ReadTimeZones(wchar_t const*) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:176:3
    #7 0x80085c69f in Winix::TimeZones::ReadTimeZones(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t> > const&) /usr/home/tomek/roboczy/prog/winix/winixd/core/timezones.cpp:199:9
    #8 0x80083c380 in Winix::System::ReadTimeZones() /usr/home/tomek/roboczy/prog/winix/winixd/core/system.cpp:122:13
    #9 0x80083ca19 in Winix::System::Init() /usr/home/tomek/roboczy/prog/winix/winixd/core/system.cpp:172:2
    #10 0x80069ce41 in Winix::App::Init() /usr/home/tomek/roboczy/prog/winix/winixd/core/app.cpp:355:9
    #11 0x2de92e in main /usr/home/tomek/roboczy/prog/winix/winixd/main/main.cpp:206:11

Address 0x7fffffffc7c0 is located in stack of thread T0 at offset 128 in frame
    #0 0x800a66f3f in pt::Space::get_object_field(wchar_t const*) /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:1518

  This frame has 3 object(s):
    [32, 40) 'i' (line 1521)
    [64, 88) 'ref.tmp' (line 1521)
    [128, 136) 'ref.tmp4' (line 1523) <== Memory access at offset 128 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /usr/home/tomek/roboczy/prog/pikotools/src/space/space.cpp:778:9 in pt::Space::is_object() const
Shadow bytes around the buggy address:
  0x4ffffffff8a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff8e0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2
=>0x4ffffffff8f0: f8 f8 f8 f2 f2 f2 f2 f2[f8]f3 f3 f3 00 00 00 00
  0x4ffffffff900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff920: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2
  0x4ffffffff930: f8 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x4ffffffff940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==15076==ABORTING
2021-06-18 18:52:24 +02:00
Tomasz Sowa 6d2503ae0e make depend 2021-06-16 23:44:21 +02:00
Tomasz Sowa 06e0f13df9 added a comment in textstream.h 2021-06-16 17:44:24 +02:00
Tomasz Sowa 8b0ed5e750 added to TextStream:
TextStreamBase & operator<<(unsigned char);
  TextStreamBase & operator<<(bool);
  TextStreamBase & operator<<(short);
  TextStreamBase & operator<<(unsigned short);
  TextStreamBase & operator<<(float);
  TextStreamBase & operator<<(long double);
2021-06-15 19:54:50 +02:00
Tomasz Sowa 4d70ae9e87 fixed: using size() when serializing strings - this allows to serialize a string which contain a null character
fixed: printing null character in space format: \u0000 (before was \0 which is not correct in json)
fixed: in serialize_string_buffer(const char * input_str, ...) a temporary fixed was used when copying input string
added support for surrogate pairs when reading \uHHHH format
added support to parse \u{H...} format (only if parsing Space format)
2021-06-14 13:48:32 +02:00
Tomasz Sowa 49c2b478c0 fixed return value from Space::add_child_space() 2021-05-21 17:32:10 +02:00
Tomasz Sowa 5ce36ea844 changed the way how child_spaces are created in Space class
- removed child_spaces and name pointers
- now a table with child spaces is created under "child_spaces" object field
- a name of the child space is stored in "name" field of the child object

added methods for manipulating with child spaces:
TableType * find_child_space_table()
bool child_spaces_empty()
size_t child_spaces_size()

Space * find_child_space(size_t table_index)
Space & add_child_space(const wchar_t * space_name)
Space & add_child_space(const std::wstring & space_name)

std::wstring * find_child_space_name()
std::wstring get_child_space_name()
bool is_child_space_name(const wchar_t * name)

added additional methods:
size_t str_size()
size_t wstr_size()
size_t object_size()
size_t table_size()
2021-05-21 17:13:11 +02:00