Tomasz Sowa
44bda888b5
fix: do not unescape xml sequences in filter mode
2022-06-01 05:17:30 +02:00
Tomasz Sowa
5253963c84
fix: put a white char before an opening tag in tree mode if it was in the source html
2022-02-08 16:34:54 +01:00
Tomasz Sowa
0100c7e453
fix: check correctly for new lines when filtering html
2022-02-08 14:52:50 +01:00
Tomasz Sowa
fd1a8270cd
read CDATA as an ordinary text
2022-01-18 19:36:40 +01:00
Tomasz Sowa
b781948f21
HTMLParser now parses correctly such entities: & < > " '
2021-12-02 17:44:41 +01:00
Tomasz Sowa
2dadfc0809
added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser
2021-11-30 16:27:27 +01:00
Tomasz Sowa
c54c398828
fixed in HTMLParser: </nofilter> tag was printed
2021-10-13 00:40:55 +02:00
Tomasz Sowa
17d2c0fb25
- added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
...
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
template<typename StreamIteratorType>
size_t utf8_to_int(
StreamIteratorType & iterator_in,
StreamIteratorType & iterator_end,
int & res,
bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa
4902eb6037
fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2
2021-10-03 13:22:49 +02:00
Tomasz Sowa
abe349be34
small refactoring in HTMLParser
2021-10-02 21:01:09 +02:00
Tomasz Sowa
f23cabfb2f
added to HTMLParser: filter_file(...) methods for filtering from a file
2021-10-02 20:34:19 +02:00
Tomasz Sowa
5b2583b566
fixed in HTMLParser: sometimes a closing item left on the stack, for stack_len < 3 there was not PopStack() called
2021-10-02 18:45:02 +02:00
Tomasz Sowa
2576eb12d1
HTMLParser: start working on xml mode
...
added methods:
Status parse_xml_file(const char * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::string & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const wchar_t * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::wstring & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
2021-08-10 21:56:04 +02:00
Tomasz Sowa
b1cc64a29b
added a compact_mode option when creating a space output
2021-08-10 01:45:10 +02:00
Tomasz Sowa
b8a03bf852
HTMLParser: added possibility to parse html to Space class
...
added method: HTMLParser::parse_html(const wchar_t * in, Space & space)
2021-08-07 21:21:16 +02:00
Tomasz Sowa
8c5ede5cf3
HTMLParser: for <script> and <!- (comments) we copy the content without parsing
2021-08-07 02:13:13 +02:00
Tomasz Sowa
fdfd0b1385
renamed: HTMLFilter -> HTMLParser
2021-08-06 17:10:19 +02:00
Tomasz Sowa
f6df8bc1bc
HTMLFilter: added a std::vector<int> stack for a current white mode - white chars mode can be changed by such tags: <textarea>, <pre>, <script>, <nofilter>
2021-07-21 15:57:46 +02:00
Tomasz Sowa
c0e940c500
fixed improper new line character after <single/> items, added Item::new_line_before flag
2021-07-21 11:30:49 +02:00
Tomasz Sowa
4f8ae6ce29
some work in HTMLFilter
...
- instead of directly using pchar pointer now we use pointers/streams from BaseParser
- removed support for putting a white char in long words: removed BreakWord(size_t break_after_) method
- changed the way how white characters are treated: added white_chars_mode(int mode) method
mode 0: WHITE_MODE_ORIGIN
mode 1: WHITE_MODE_SINGLE_LINE
mode 2: WHITE_MODE_TREE
2021-07-20 20:48:01 +02:00
Tomasz Sowa
2a3f43c5c3
added BBCODEParser (html/bbcodeparser.h|cpp) - copied from winix project
2021-07-17 13:54:03 +02:00
Tomasz Sowa
bdb2616f32
added: HTMLFilter (html/htmlfilter.h|cpp) - copied from winix project
2021-07-17 13:35:10 +02:00