Tomasz Sowa
4f07c00217
Merge branch 'api2021'
2022-06-25 17:52:42 +02:00
Tomasz Sowa
44bda888b5
fix: do not unescape xml sequences in filter mode
2022-06-01 05:17:30 +02:00
Tomasz Sowa
68fe25c8bf
add limits when parsing a json/space format
...
while here:
- add column index error
- add parsing methods with pt::TextStream and pt::WTextStream arguments
2022-05-30 01:01:14 +02:00
Tomasz Sowa
a40bab0445
add Space::get_table_item() method
2022-05-30 00:55:38 +02:00
Tomasz Sowa
c3b7ab5793
add min_width parameter to methods converting int to string
2022-05-28 06:06:32 +02:00
Tomasz Sowa
5d2788d0d8
add Log::put_multiline() methods
2022-05-25 19:57:35 +02:00
Tomasz Sowa
72c10b20fb
flush logs when printing to stdout
2022-04-27 22:07:58 +02:00
Tomasz Sowa
3173042229
make depend
2022-04-26 23:47:27 +02:00
Tomasz Sowa
5253963c84
fix: put a white char before an opening tag in tree mode if it was in the source html
2022-02-08 16:34:54 +01:00
Tomasz Sowa
0100c7e453
fix: check correctly for new lines when filtering html
2022-02-08 14:52:50 +01:00
Tomasz Sowa
ac3c59323b
add methods: try_esc_to_json(wchar_t val, stream) try_esc_to_xml(...) try_esc_to_csv(...)
...
Those methods return true if the val character was escaped and put
to the out stream. If the character is invalid for such a stream
they only return true without putting it to the stream.
2022-02-04 14:19:54 +01:00
Tomasz Sowa
3b9b464bb7
fix: add typename keyword in TextStreamBase<> in some places
2022-02-03 19:21:22 +01:00
Tomasz Sowa
6b97b1b74a
fix: correctly escape json/xml/csv wide strings
...
A wide string was first changed to utf-8 and then escaped to json/xml/csv
which is incorrect. First should be escaped and then changed to utf-8.
Add TextStreamBase<>::iterator and TextStreamBase<>::const_interator as classes
with a method wchar_t get_unicode_and_advance(const iterator & end)
to return one character either from utf-8 stream or from wide stream.
Let TextStreamBase<>::operator<<(wchar_t v) correctly use utf-8.
2022-02-03 19:08:21 +01:00
Tomasz Sowa
fd1a8270cd
read CDATA as an ordinary text
2022-01-18 19:36:40 +01:00
Tomasz Sowa
b781948f21
HTMLParser now parses correctly such entities: & < > " '
2021-12-02 17:44:41 +01:00
Tomasz Sowa
2dadfc0809
added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser
2021-11-30 16:27:27 +01:00
Tomasz Sowa
bb9205a55e
added: Space::Space(const Date & date), Space::set(const Date & date), Space::add(const Date & date), Space::add(const wchar_t * field, const Date & date)
2021-11-05 09:27:32 +01:00
Tomasz Sowa
5eff9a5f4f
Space::to_bool() return true now when a string/object or table is non empty
2021-10-20 08:30:57 +02:00
Tomasz Sowa
c54c398828
fixed in HTMLParser: </nofilter> tag was printed
2021-10-13 00:40:55 +02:00
Tomasz Sowa
17d2c0fb25
- added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
...
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
template<typename StreamIteratorType>
size_t utf8_to_int(
StreamIteratorType & iterator_in,
StreamIteratorType & iterator_end,
int & res,
bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa
4902eb6037
fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2
2021-10-03 13:22:49 +02:00
Tomasz Sowa
5e4c7e9929
make depend
2021-10-02 21:01:19 +02:00
Tomasz Sowa
abe349be34
small refactoring in HTMLParser
2021-10-02 21:01:09 +02:00
Tomasz Sowa
f23cabfb2f
added to HTMLParser: filter_file(...) methods for filtering from a file
2021-10-02 20:34:19 +02:00
Tomasz Sowa
5b2583b566
fixed in HTMLParser: sometimes a closing item left on the stack, for stack_len < 3 there was not PopStack() called
2021-10-02 18:45:02 +02:00
Tomasz Sowa
2cc9dd69a3
make depend
2021-08-12 21:53:52 +02:00
Tomasz Sowa
2576eb12d1
HTMLParser: start working on xml mode
...
added methods:
Status parse_xml_file(const char * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::string & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const wchar_t * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::wstring & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
2021-08-10 21:56:04 +02:00
Tomasz Sowa
b1cc64a29b
added a compact_mode option when creating a space output
2021-08-10 01:45:10 +02:00
Tomasz Sowa
b8a03bf852
HTMLParser: added possibility to parse html to Space class
...
added method: HTMLParser::parse_html(const wchar_t * in, Space & space)
2021-08-07 21:21:16 +02:00
Tomasz Sowa
7fcfdac52f
Space: added pretty_print parameter to some json serializing methods
2021-08-07 21:19:38 +02:00
Tomasz Sowa
8c5ede5cf3
HTMLParser: for <script> and <!- (comments) we copy the content without parsing
2021-08-07 02:13:13 +02:00
Tomasz Sowa
fdfd0b1385
renamed: HTMLFilter -> HTMLParser
2021-08-06 17:10:19 +02:00
Tomasz Sowa
f6df8bc1bc
HTMLFilter: added a std::vector<int> stack for a current white mode - white chars mode can be changed by such tags: <textarea>, <pre>, <script>, <nofilter>
2021-07-21 15:57:46 +02:00
Tomasz Sowa
c0e940c500
fixed improper new line character after <single/> items, added Item::new_line_before flag
2021-07-21 11:30:49 +02:00
Tomasz Sowa
4f8ae6ce29
some work in HTMLFilter
...
- instead of directly using pchar pointer now we use pointers/streams from BaseParser
- removed support for putting a white char in long words: removed BreakWord(size_t break_after_) method
- changed the way how white characters are treated: added white_chars_mode(int mode) method
mode 0: WHITE_MODE_ORIGIN
mode 1: WHITE_MODE_SINGLE_LINE
mode 2: WHITE_MODE_TREE
2021-07-20 20:48:01 +02:00
Tomasz Sowa
7ce07c57f5
added a base class for parsers: BaseParser (convert/baseparser.h|cpp)
...
there are methods for reading from string/files there
those methods were moved from SpaceParser and CSVParser
fixed: CSVParser didn't set input_as_utf8 flag
2021-07-17 14:38:22 +02:00
Tomasz Sowa
2a3f43c5c3
added BBCODEParser (html/bbcodeparser.h|cpp) - copied from winix project
2021-07-17 13:54:03 +02:00
Tomasz Sowa
bdb2616f32
added: HTMLFilter (html/htmlfilter.h|cpp) - copied from winix project
2021-07-17 13:35:10 +02:00
Tomasz Sowa
6c41e0a803
Merge branch 'api2021'
2021-07-06 22:45:54 +02:00
Tomasz Sowa
1e5598cde1
added to Date: SerializeMonthAsRoman(Stream & out, int month) - serialize month in Roman numerals
...
added a param: 'bool roman_month' to some serialize methods
2021-07-06 21:44:04 +02:00
Tomasz Sowa
198945c97b
PatternReplacerBase: to_string() changed to to_str()
2021-07-06 21:42:42 +02:00
Tomasz Sowa
34f1fc04cf
added Space::remove(size_t table_index) for removing a table item
...
fixed: pretty printing for Space format
2021-06-29 23:25:31 +02:00
Tomasz Sowa
8997284b16
added trim(...) functions to convert/text.h
...
void trim_first_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_first_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_last_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_last_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_white(std::string & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_white(std::wstring & str, bool check_additional_chars = true, bool treat_new_line_as_white = true);
void trim_first(std::string & str, wchar_t c);
void trim_first(std::wstring & str, wchar_t c);
void trim_last(std::string & str, wchar_t c);
void trim_last(std::wstring & str, wchar_t c);
void trim(std::string & str, wchar_t c);
void trim(std::wstring & str, wchar_t c);
2021-06-29 23:23:35 +02:00
Tomasz Sowa
e31ef3c6c4
make depend
2021-06-27 22:34:05 +02:00
Tomasz Sowa
e0d6e7fcb1
added to Space:
...
Space & get_add_space(const wchar_t * field);
Space & get_add_space(const std::wstring & field);
2021-06-27 15:58:53 +02:00
Tomasz Sowa
009e240a8d
fixed some memory leaks in Space, pointers in tables and objects were not correctly 'deleted', affected methods:
...
set_empty_table()
set_empty_object()
add(const wchar_t * field, Space && space)
copy_value_object(const Value & value_from)
copy_value_table(const Value & value_from)
initialize_value_object_if_needed(ObjectType && obj)
initialize_value_table_if_needed(TableType && tab)
add_generic(const wchar_t * field, const ArgType & val)
2021-06-27 15:41:38 +02:00
Tomasz Sowa
4a1630b1ea
removed support for so called child objects from Space (this was an old feature of Space struct, now not needed)
...
Space::get_object_field(...) renamed to Space::get_space(...)
2021-06-26 22:56:12 +02:00
Tomasz Sowa
8ec9350d52
added two functions to utf8:
...
template<typename StreamType> bool utf8_to_wide(const Stream & stream, StreamType & res, bool clear = true, int mode = 1);
template<typename StreamType> bool wide_stream_to_utf8(const Stream & stream, StreamType & utf8, bool clear = true, int mode = 1);
these functions are moved from TextStreamBase
2021-06-25 19:10:01 +02:00
Tomasz Sowa
792057a869
make depend
2021-06-24 21:18:48 +02:00
Tomasz Sowa
4d9f5f6c55
Log class has the Stream class as a base class now
...
- implemented some missing operators<<(...)
- removed Manipulators: l1, l2, l3, l4, lend, lsave
- PascalCase to snake_case in Log
added to Stream:
virtual bool is_char_stream() const = 0;
virtual bool is_wchar_stream() const = 0;
virtual char get_char(size_t index) const = 0;
virtual wchar_t get_wchar(size_t index) const = 0;
virtual Stream & operator<<(const Stream & stream) = 0;
2021-06-24 20:52:48 +02:00