I'm currently working on a web application written in Java. Part of this application generate summary of long HTML texts by truncating them to a fixed size. In Python, I usually use
I did a couple searches on Google but couldn't find something quick and easy. So, I went back and looked at the Django's code and fortunately it was very straight forward. Find the adapted code below:
Feel free to use the code under Modified-BSD License but keep in mind unlike Django, it's not been thoroughly tested and may not function correctly in all cases.
Links:
Django Project Website
Original Source Code
truncate_html_words
of Django template engine so I looked for a similar easy method in Java.I did a couple searches on Google but couldn't find something quick and easy. So, I went back and looked at the Django's code and fortunately it was very straight forward. Find the adapted code below:
/** * Copyright (c) Django Software Foundation and individual contributors. * All rights reserved. * * Copyright (c) 2011 Masood Behabadi <masood@dentcat.com> * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * 3. Neither the name of Django nor the names of its contributors may be used * to endorse or promote products derived from this software without * specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON * ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ public static String truncateHtmlWords(String html, int length){ if (length <= 0) return new String(); List<string> html4Singlets = Arrays.asList( "br", "col", "link", "base", "img", "param", "area", "hr", "input"); // Set up regular expressions Pattern pWords = Pattern.compile("&.*?;|<.*?>|(\\w[\\w-]*)"); Pattern pTag = Pattern.compile("<(/)?([^ ]+?)(?: (/)| .*?)?>"); Matcher mWords = pWords.matcher(html); // Count non-HTML words and keep note of open tags int endTextPos = 0; int words = 0; List<string> openTags = new ArrayList<string>(); while (words <= length) { if (!mWords.find()) break; if (mWords.group(1) != null) { // It's an actual non-HTML word words += 1; if (words == length) endTextPos = mWords.end(); continue; } // Check for tag Matcher tag = pTag.matcher(mWords.group()); if (!tag.find() || endTextPos != 0) // Don't worry about non tags or tags after our // truncate point continue; String closingTag = tag.group(1); // Element names are always case-insensitive String tagName = tag.group(2).toLowerCase(); String selfClosing = tag.group(3); if (closingTag != null) { int i = openTags.indexOf(tagName); if (i != -1) openTags = openTags.subList(i + 1, openTags.size()); } else if (selfClosing == null && !html4Singlets.contains(tagName)) openTags.add(0, tagName); } if (words <= length) return html; StringBuilder out = new StringBuilder(html.substring(0, endTextPos)); for (String tag: openTags) out.append(""); return out.toString(); }
Feel free to use the code under Modified-BSD License but keep in mind unlike Django, it's not been thoroughly tested and may not function correctly in all cases.
Links:
Django Project Website
Original Source Code
Great!! i have to try now!
ReplyDeleteThank you for the blog. Found it interesting and useful. Java is a general purpose, high-level, class-based and object-oriented programming language. And we provide Java training in Chennai at Fita.
ReplyDeletehowdy, your websites are really good. I appreciate your work. web design agency san francisco
ReplyDeleteI completely understand everything you have said. Actually, I browsed through your additional content articles and I think you happen to be absolutely right. Great job with this online site. web design agency
ReplyDeleteThe luxury proposed might be incomparable; citizens are never fail to looking for bags is a Native goals. The idea numerous insert goals uniquely to push diversity with visibility during the travel and leisure arena. Hotels Discounts website tips
ReplyDelete