Tomasz Sowa
f35e2122ed
(HtmlParser): rename ItemParsedListener to Listener
...
while here:
- add a new callback method: bool should_remove(Item &)
2024-04-17 10:39:06 +02:00
Tomasz Sowa
f02dd1093a
fix(HtmlParser): correctly remove an item from the space struct when requested from a callback
...
while here:
- implement the removing algorithm for the compact_mode
2024-04-16 09:35:47 +02:00
Tomasz Sowa
21614a5309
fix: memory leak in the HTMLParser when a compact mode was used
2023-10-18 18:33:52 +02:00
Tomasz Sowa
7e92b5d9d7
add HTMLParser::parse_xml(...) methods
2023-07-04 22:58:43 +02:00
Tomasz Sowa
379adf6a69
allow to parse a time decimal fraction in ParseTime() method
...
while here:
- let ParseDate() is able to parse such formats: "20081012" (without a separator)
and without the month or day e.g: "2008" or "200810"
- let ParseTime() is able to parse a time without separators, e.g.:
"141030", or "1410" or just "14"
- let Parse(...) method use ParseDate() and ParseTime()
this will parse a format similar to ISO 8601
2022-12-23 02:15:11 +01:00
Tomasz Sowa
b81daf9fb6
set 2-Clause BSD licence in *.cpp files
2022-06-30 13:44:21 +02:00
Tomasz Sowa
74230d667b
change headerfile_picotools_* macros to headerfile_pikotools_*
2022-06-30 12:45:08 +02:00
Tomasz Sowa
cadba907b2
change licence from 3-Clause BSD to 2-Clause BSD
2022-06-30 12:09:22 +02:00
Tomasz Sowa
44bda888b5
fix: do not unescape xml sequences in filter mode
2022-06-01 05:17:30 +02:00
Tomasz Sowa
5253963c84
fix: put a white char before an opening tag in tree mode if it was in the source html
2022-02-08 16:34:54 +01:00
Tomasz Sowa
0100c7e453
fix: check correctly for new lines when filtering html
2022-02-08 14:52:50 +01:00
Tomasz Sowa
fd1a8270cd
read CDATA as an ordinary text
2022-01-18 19:36:40 +01:00
Tomasz Sowa
b781948f21
HTMLParser now parses correctly such entities: & < > " '
2021-12-02 17:44:41 +01:00
Tomasz Sowa
2dadfc0809
added: HTMLParser::ItemParsedListener listener with an item_parsed(...) method which is called when a tag is parsed by the parser
2021-11-30 16:27:27 +01:00
Tomasz Sowa
c54c398828
fixed in HTMLParser: </nofilter> tag was printed
2021-10-13 00:40:55 +02:00
Tomasz Sowa
17d2c0fb25
- added some converting methods: esc_to_json(...), esc_to_xml(...), esc_to_csv() (convert/misc.h)
...
- BaseParser: added possibility to read from TextStream and WTextStream
- HTMLParser: added filter(const WTextStream & in, Stream & out, ...) method
- added utf8_stream.h with one method:
template<typename StreamIteratorType>
size_t utf8_to_int(
StreamIteratorType & iterator_in,
StreamIteratorType & iterator_end,
int & res,
bool & correct)
2021-10-12 19:53:11 +02:00
Tomasz Sowa
4902eb6037
fixed: in HTMLParser::CheckClosingTags() don't return immediately if stack_len is equal to 2
2021-10-03 13:22:49 +02:00
Tomasz Sowa
abe349be34
small refactoring in HTMLParser
2021-10-02 21:01:09 +02:00
Tomasz Sowa
f23cabfb2f
added to HTMLParser: filter_file(...) methods for filtering from a file
2021-10-02 20:34:19 +02:00
Tomasz Sowa
5b2583b566
fixed in HTMLParser: sometimes a closing item left on the stack, for stack_len < 3 there was not PopStack() called
2021-10-02 18:45:02 +02:00
Tomasz Sowa
2576eb12d1
HTMLParser: start working on xml mode
...
added methods:
Status parse_xml_file(const char * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::string & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const wchar_t * file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
Status parse_xml_file(const std::wstring & file_name, Space & out_space, bool compact_mode = false, bool clear_space = true);
2021-08-10 21:56:04 +02:00
Tomasz Sowa
b1cc64a29b
added a compact_mode option when creating a space output
2021-08-10 01:45:10 +02:00
Tomasz Sowa
b8a03bf852
HTMLParser: added possibility to parse html to Space class
...
added method: HTMLParser::parse_html(const wchar_t * in, Space & space)
2021-08-07 21:21:16 +02:00
Tomasz Sowa
8c5ede5cf3
HTMLParser: for <script> and <!- (comments) we copy the content without parsing
2021-08-07 02:13:13 +02:00
Tomasz Sowa
fdfd0b1385
renamed: HTMLFilter -> HTMLParser
2021-08-06 17:10:19 +02:00
Tomasz Sowa
f6df8bc1bc
HTMLFilter: added a std::vector<int> stack for a current white mode - white chars mode can be changed by such tags: <textarea>, <pre>, <script>, <nofilter>
2021-07-21 15:57:46 +02:00
Tomasz Sowa
c0e940c500
fixed improper new line character after <single/> items, added Item::new_line_before flag
2021-07-21 11:30:49 +02:00
Tomasz Sowa
4f8ae6ce29
some work in HTMLFilter
...
- instead of directly using pchar pointer now we use pointers/streams from BaseParser
- removed support for putting a white char in long words: removed BreakWord(size_t break_after_) method
- changed the way how white characters are treated: added white_chars_mode(int mode) method
mode 0: WHITE_MODE_ORIGIN
mode 1: WHITE_MODE_SINGLE_LINE
mode 2: WHITE_MODE_TREE
2021-07-20 20:48:01 +02:00
Tomasz Sowa
2a3f43c5c3
added BBCODEParser (html/bbcodeparser.h|cpp) - copied from winix project
2021-07-17 13:54:03 +02:00
Tomasz Sowa
bdb2616f32
added: HTMLFilter (html/htmlfilter.h|cpp) - copied from winix project
2021-07-17 13:35:10 +02:00