Commit Graph

113 Commits

Author SHA1 Message Date
Tomasz Sowa 90915a7209
add a KeyValueParser for parsing simple key/value strings 2023-11-07 03:54:34 +01:00
Tomasz Sowa 21614a5309
fix: memory leak in the HTMLParser when a compact mode was used 2023-10-18 18:33:52 +02:00
Tomasz Sowa 57f49cdcb6
fix: memory leak in the SpaceParser 2023-10-18 16:51:52 +02:00
Tomasz Sowa e94589d6b5
add a Date::CompareDate(...) and Date::CompareTime(...) methods 2023-09-07 04:41:41 +02:00
Tomasz Sowa 09215ef5f2
add Space::to_date(...) methods 2023-07-18 18:22:13 +02:00
Tomasz Sowa 2c4bfe085b
add == and != operators to the TextStreamBase<> class 2023-07-14 09:07:57 +02:00
Tomasz Sowa 172c2fcee7
add to_str(...) methods to the TextStreamBase<> class
add such methods:
bool to_str(char * str, size_t max_buf_len) const;
bool to_str(wchar_t * str, size_t max_buf_len) const;
2023-07-14 07:42:09 +02:00
Tomasz Sowa 78d31861de
add some wide/utf8 convertion methods
add following methods:
size_t int_to_wide(int c, wchar_t * res, size_t max_buf_len);

template<typename StreamIteratorType>
bool utf8_to_wide(StreamIteratorType & iterator_in, const StreamIteratorType & iterator_end, wchar_t * out_buffer, size_t max_buffer_len, int mode = 1, bool * was_buffer_sufficient_large = nullptr);

template<typename StreamType>
bool utf8_to_wide(const StreamType & stream, wchar_t * out_buffer, size_t max_buffer_len, bool * was_buffer_sufficient_large = nullptr, int mode = 1);

template<typename StreamType>
bool wide_stream_to_utf8(StreamType & buffer, char * utf8, std::size_t max_buffer_size, bool * was_buffer_sufficient_large = nullptr, int mode = 1);
2023-07-14 07:41:14 +02:00
Tomasz Sowa 7e92b5d9d7
add HTMLParser::parse_xml(...) methods 2023-07-04 22:58:43 +02:00
Tomasz Sowa cbaf57bec3
add some constants to the Date class 2023-07-04 22:57:50 +02:00
Tomasz Sowa 987d9c845c
declare esc_to_csv() method with a wstring 2023-05-27 18:18:00 +02:00
Tomasz Sowa 96a3a564cf add a virtual dctor to BaseParser() 2022-12-23 04:38:03 +01:00
Tomasz Sowa 379adf6a69 allow to parse a time decimal fraction in ParseTime() method
while here:
- let ParseDate() is able to parse such formats: "20081012" (without a separator)
  and without the month or day e.g: "2008" or "200810"
- let ParseTime() is able to parse a time without separators, e.g.:
  "141030", or "1410" or just "14"
- let Parse(...) method use ParseDate() and ParseTime()
  this will parse a format similar to ISO 8601
2022-12-23 02:15:11 +01:00
Tomasz Sowa 3b3c04b85d fix: rename Toul -> to_ul in PatternReplacer 2022-11-16 16:14:16 +01:00
Tomasz Sowa b3137a7607 rename functions for converting strings to integers to snake case
while here:
- add some functions taking std::string/std::wstring
2022-11-14 03:20:17 +01:00
Tomasz Sowa f97c06d441 add a check_time_zone parameter when parsing a date 2022-10-22 16:26:14 +02:00
Tomasz Sowa e501a3f4a3 remove FileLog::synchro_lock() and FileLog::synchro_unlock() 2022-09-01 07:32:48 +02:00
Tomasz Sowa ce0348b2d7 add to Space methods which takes a Stream as an argument
- Space::set(const Stream & stream)
- Space::add(const Stream & stream)
- Space::add(const wchar_t * field, const Stream & stream)
- Space::add(const std::wstring & field, const Stream & stream)
2022-08-20 00:26:12 +02:00
Tomasz Sowa 7eba07a439 fix(Space): increment value object iterator in get_space_nc 2022-08-10 12:40:46 +02:00
Tomasz Sowa 663233fe2a let all utf8/wide functions can be available just by including utf8/utf8.h
while here:
- remove utf8/utf8_stream.h, now we only need utf8/utf8.h to include
- add some new methods for converting from a utf8 stream to wide stream/string
- do some improvements in TextStream:
  - don't use temporary objects to convert utf8/wide
  - add put_stream() which takes TextStreamBase<> as its argument
    (uses an iterator instead of get_char() for reading)
  - let operator<<(const Space & space) serialize to json and not to Space
2022-07-30 03:31:18 +02:00
Tomasz Sowa 84e9e6f98f add methods to Space that take a pointer to a string along with the length
Space::Space(const char * str, size_t len)
Space::Space(const wchar_t * str, size_t len)
Space::set(const char * str, size_t len)
Space::set(const wchar_t * str, size_t len)
Space::add_to_table(const char * val, size_t len)
Space::add_to_table(const wchar_t * val, size_t len)
Space::add(const wchar_t * field, const char * val, size_t len)
Space::add(const wchar_t * field, const wchar_t * val, size_t len)
Space::add(const std::wstring & field, const char * val, size_t len)
Space::add(const std::wstring & field, const wchar_t * val, size_t len)
2022-07-30 03:12:38 +02:00
Tomasz Sowa 9a596dd097 fix: return a correct value from Log::size and Log::capacity 2022-07-30 02:45:19 +02:00
Tomasz Sowa aa97fe2811 add methods for trimming \r\n from the end of a string
void trim_last_new_lines(std::string & str, bool check_carriage_return_too = true);
void trim_last_new_lines(std::wstring & str, bool check_carriage_return_too = true);
2022-07-30 02:43:29 +02:00
Tomasz Sowa d13c10c604 add methods for converting from hex string to bytes
add to convert/text.h:
template<typename HexStringPointerType, typename BytesStringType>
bool hex_string_pointer_to_bytes(const HexStringPointerType * hex_string, BytesStringType & bytes, bool clear_bytes = true);

template<typename HexStringType, typename BytesStringType>
bool hex_string_to_bytes(const HexStringType & hex_string, BytesStringType & bytes, bool clear_bytes = true);
2022-07-26 05:14:35 +02:00
Tomasz Sowa a524dfa2a7 add Space::to_float(...), to_double(...) and to_long_double(...) methods 2022-07-08 21:59:39 +02:00
Tomasz Sowa b81daf9fb6 set 2-Clause BSD licence in *.cpp files 2022-06-30 13:44:21 +02:00
Tomasz Sowa 74230d667b change headerfile_picotools_* macros to headerfile_pikotools_* 2022-06-30 12:45:08 +02:00
Tomasz Sowa dad8042c41 add pikotools/version.h file 2022-06-30 12:44:06 +02:00
Tomasz Sowa cadba907b2 change licence from 3-Clause BSD to 2-Clause BSD 2022-06-30 12:09:22 +02:00
Tomasz Sowa 4933378ed6 make depend 2022-06-26 05:40:44 +02:00
Tomasz Sowa 44bda888b5 fix: do not unescape xml sequences in filter mode 2022-06-01 05:17:30 +02:00
Tomasz Sowa 68fe25c8bf add limits when parsing a json/space format
while here:
- add column index error
- add parsing methods with pt::TextStream and pt::WTextStream arguments
2022-05-30 01:01:14 +02:00
Tomasz Sowa a40bab0445 add Space::get_table_item() method 2022-05-30 00:55:38 +02:00
Tomasz Sowa c3b7ab5793 add min_width parameter to methods converting int to string 2022-05-28 06:06:32 +02:00
Tomasz Sowa 5d2788d0d8 add Log::put_multiline() methods 2022-05-25 19:57:35 +02:00
Tomasz Sowa 72c10b20fb flush logs when printing to stdout 2022-04-27 22:07:58 +02:00
Tomasz Sowa 3173042229 make depend 2022-04-26 23:47:27 +02:00
Tomasz Sowa 5253963c84 fix: put a white char before an opening tag in tree mode if it was in the source html 2022-02-08 16:34:54 +01:00
Tomasz Sowa 0100c7e453 fix: check correctly for new lines when filtering html 2022-02-08 14:52:50 +01:00
Tomasz Sowa ac3c59323b add methods: try_esc_to_json(wchar_t val, stream) try_esc_to_xml(...) try_esc_to_csv(...)
Those methods return true if the val character was escaped and put
to the out stream. If the character is invalid for such a stream
they only return true without putting it to the stream.
2022-02-04 14:19:54 +01:00
Tomasz Sowa 3b9b464bb7 fix: add typename keyword in TextStreamBase<> in some places 2022-02-03 19:21:22 +01:00
Tomasz Sowa 6b97b1b74a fix: correctly escape json/xml/csv wide strings
A wide string was first changed to utf-8 and then escaped to json/xml/csv
which is incorrect. First should be escaped and then changed to utf-8.

Add TextStreamBase<>::iterator and TextStreamBase<>::const_interator as classes
with a method wchar_t get_unicode_and_advance(const iterator & end)
to return one character either from utf-8 stream or from wide stream.

Let TextStreamBase<>::operator<<(wchar_t v) correctly use utf-8.
2022-02-03 19:08:21 +01:00
Tomasz Sowa fd1a8270cd read CDATA as an ordinary text 2022-01-18 19:36:40 +01:00
Tomasz Sowa b781948f21 HTMLParser now parses correctly such entities: &amp; &lt; &gt; &quot; &apos; 2021-12-02 17:44:41 +01:00
Tomasz Sowa 2dadfc0809 added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser 2021-11-30 16:27:27 +01:00
Tomasz Sowa bb9205a55e added: Space::Space(const Date & date), Space::set(const Date & date), Space::add(const Date & date), Space::add(const wchar_t * field, const Date & date) 2021-11-05 09:27:32 +01:00
Tomasz Sowa 5eff9a5f4f Space::to_bool() return true now when a string/object or table is non empty 2021-10-20 08:30:57 +02:00
Tomasz Sowa c54c398828 fixed in HTMLParser: </nofilter> tag was printed 2021-10-13 00:40:55 +02:00
Tomasz Sowa 17d2c0fb25 - added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
  template<typename StreamIteratorType>
  size_t utf8_to_int(
    StreamIteratorType & iterator_in,
    StreamIteratorType & iterator_end,
    int & res,
    bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa 4902eb6037 fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2 2021-10-03 13:22:49 +02:00