Download List

프로젝트 설명

TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty, and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command line processor that reads HTML files, and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

System Requirements

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2007-03-21 21:03
1.0.5

주요 문제는 매우 심하게 부러져 있던 HTML 주석, : 모든 "문자 하나를 종료할 것이라고하므로 제대로 작동하지 않았다 요소를 언급했다. 이제 모든 게 정확해야합니다. 사람은 모두 할 수있는 업데이 트를해야합니다. 또한, & # xNNNN 형식 자본 엑스 ()와 지금은 작품을, 몇 가지 디버깅 코드를 PYXWriter, 유니 코드에서 생략됩니다 BOM을 문서의 시작 부분에서, 그리고 색슨의 새 버전의 XSLT 프로세서로 지원이 제거되었습니다. 문서 SAX 기능 및 속성을 TagSoup 특정에 추가되었습니다.
Tags: Major bugfixes
The main issue was with HTML comments, which were very badly broken: any > character would terminate one, so commenting out elements did not work properly. Everything should now be correct. Everyone should update who possibly can. Additionally, &#Xnnnn (with capital X) now works, some debugging code was removed from PYXWriter, a Unicode BOM at the beginning of a document is skipped, and the new version of Saxon is supported as an XSLT processor. Documentation has been added on SAX features and properties specific to TagSoup.

2007-02-07 09:11
1.0.3

경우에는 하나의 입력에 DOCTYPE 선언 출력됩니다. - ignorable 스위치는 요소 콘텐츠에 공백을 보존하기 위해 추가되었습니다. - 출력 - 출력 스위치를 인코딩 인코딩을 지정하는 추가되었습니다. html로에 대한 기본 값은 / @ 버전을 제거했다. 여러 가지 사소한 버그가 수정되었습니다.
Tags: Minor bugfixes
A DOCTYPE declaration will be output if there is
one in the input. The --ignorable switch was added
to preserve whitespace in element content. The
--output-encoding switch was added to specify
output encoding. The default values for
html/@version were removed. Various minor bugs
were fixed.

2006-06-15 23:38
1.0

모든 알려진 버그 및 고정하는 모든 적절한 것으로 간주 기능이 추가되었습니다. 이 릴리스를 사용하여 본격적인 생산을위한 준비가되었습니다.
Tags: Code cleanup
All known bugs are fixed and all features considered appropriate have been added. This release is ready for full production use.

2003-01-23 09:05
0.8

Tags: Initial freshmeat announcement

Project Resources