Commit Graph

17 Commits

Author SHA1 Message Date
Tomasz Sowa 44bda888b5 fix: do not unescape xml sequences in filter mode 2022-06-01 05:17:30 +02:00
Tomasz Sowa 5253963c84 fix: put a white char before an opening tag in tree mode if it was in the source html 2022-02-08 16:34:54 +01:00
Tomasz Sowa 0100c7e453 fix: check correctly for new lines when filtering html 2022-02-08 14:52:50 +01:00
Tomasz Sowa fd1a8270cd read CDATA as an ordinary text 2022-01-18 19:36:40 +01:00
Tomasz Sowa b781948f21 HTMLParser now parses correctly such entities: & < > " ' 2021-12-02 17:44:41 +01:00
Tomasz Sowa 2dadfc0809 added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser 2021-11-30 16:27:27 +01:00
Tomasz Sowa c54c398828 fixed in HTMLParser: </nofilter> tag was printed 2021-10-13 00:40:55 +02:00
Tomasz Sowa 17d2c0fb25 - added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
  template<typename StreamIteratorType>
  size_t utf8_to_int(
    StreamIteratorType & iterator_in,
    StreamIteratorType & iterator_end,
    int & res,
    bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa 4902eb6037 fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2 2021-10-03 13:22:49 +02:00
Tomasz Sowa abe349be34 small refactoring in HTMLParser 2021-10-02 21:01:09 +02:00
Tomasz Sowa f23cabfb2f added to HTMLParser: filter_file(...) methods for filtering from a file 2021-10-02 20:34:19 +02:00
Tomasz Sowa 5b2583b566 fixed in HTMLParser: sometimes a closing item left on the stack, for stack_len < 3 there was not PopStack() called 2021-10-02 18:45:02 +02:00
Tomasz Sowa 2576eb12d1 HTMLParser: start working on xml mode
added methods:
Status parse_xml_file(const char * file_name,         Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::string & file_name,  Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const wchar_t * file_name,      Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::wstring & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
2021-08-10 21:56:04 +02:00
Tomasz Sowa b1cc64a29b added a compact_mode option when creating a space output 2021-08-10 01:45:10 +02:00
Tomasz Sowa b8a03bf852 HTMLParser: added possibility to parse html to Space class
added method: HTMLParser::parse_html(const wchar_t * in, Space & space)
2021-08-07 21:21:16 +02:00
Tomasz Sowa 8c5ede5cf3 HTMLParser: for <script> and <!- (comments) we copy the content without parsing 2021-08-07 02:13:13 +02:00
Tomasz Sowa fdfd0b1385 renamed: HTMLFilter -> HTMLParser 2021-08-06 17:10:19 +02:00