yeomanly

jTidy cfc (stand alone and CFWheels plugin)

2010 January 02
tags: CFWheels · ColdFusion
by Mike Henke

"JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML." More information

I was looking for something to clean up some html for pdf generation using cfdocument and found a jtidy.cfc used by Farcry for returning valid xHTML.

If you weren't familiar, "FarCry Core is a web application framework based on the ColdFusion language. FarCry CMS is a popular content management solution built with FarCry Core."

I reviewed the license and from my understanding of "GNU GPL License v3 (GPL)" I can publish my modification. I added javaloader and did some slight modification of the code. My code changes are here for review and download: jTidy CFC

Implementation is really easy for this version of jTidy with ColdFusion.

Drop the jtidy_cfc folder in your webroot or add cfmapping to it, then invoke the jtidy.cfc and pass in the html you want to return as valid xHTML.

Here is the code for my samples, it take some not valid xhmtl examples and fixes them.

<!--- see readme.txt for testing this example file --->

<!--- component path to jtidy.cfc --->
<cfset componentPath = "jtidy_cfc.jtidy" />

<cfsavecontent variable="test">
<html>
   <head>
      <title>jtidy test page</title>
   </head>
   <body>
   
      <!-- examples from http://en.wikipedia.org/wiki/XHTML -->
      <table id="companyAccountsTable">
         <tbody>
            <tr>
               <td>mike henke</td>
            </tr>
         </tbody>
      </table>
      
      <form action="/index.cfm">
      
      <!-- Not putting quotation marks around attribute values -->
      <input type=text value=hello />
   
      </form>
      
      <!-- Not closing non-empty elements -->
      <p>
      
      <!-- Improperly nesting elements -->
      <em><strong>This is some text.</em></strong>
      
      <!-- Using the ampersand character outside of entities -->
      <div>Cars & Trucks</div>
      
      <!-- Not closing empty elements -->
      <br>
      
      <!-- Using the ampersand character outside of entities -->
      <div><a href="index.cfm?page=news&id=5">News</a></div>
      
      <!-- Using attribute minimization -->
      <div><textarea readonly>READ-ONLY</textarea></div>
      
      <!-- Failing to recognize that XHTML elements and attributes are case sensitive -->
      <P ID="ONE">The Best Page Ever</P>
   
   </body>
</html>
</cfsavecontent>

<cfinvoke
component="#componentPath#"
method="makexHTMLValid"
   strToParse="#test#"
returnvariable="validxHTML"
>

   
<!--- <cfdump var="#validxHTML#"> --->

<cfoutput>#validxHTML#</cfoutput>


I also created a jTidy CFC plugin for CFWheels.

jTidy cfc (stand alone and CFWheels plugin)

4 Responses leave one →
  1. Allen
    Allen PERMALINK
    Jan 2, 2010 at 11:04 AM

    Very handy. Thanks! I haven't looked into this more but any chance with a little work this could be used as part of unit testing to check code to make sure the html is valid?

  1. Mike Henke
    Jan 2, 2010 at 11:32 AM

    There is a jtidy ant task http://bit.ly/7EPE2F but the trick for ColdFusion developers would be correctly parsing the ColdFusion tag and script syntax.

  1. Sam Farmer
    Jan 2, 2010 at 4:20 PM

    Looks very cool.

  1. Geoff Bowers
    Jan 6, 2010 at 9:40 PM

    Mat Bryant, the current development lead, was responsible for the original component. Lot of hidden gems in the FarCry code base :)

Leave a Reply

Leave this field empty: