Saturday, February 9, 2013

Strip html tags and extract subset of string from text using regular expression in c-sharp

Today I am presenting a quick tips on how to strip html from text using regular expression (with Regex class) in C#. In a scenario like presenting a blurb or summary of certain characters we may need to remove html tags from a html string (of news details, article details etc.). I have following function in my Helper library for the very problem.


    /// 
    /// Strip out html tags from text
    /// 
    /// Source string
    /// 
    public static string StripTagsFromHtml(string source)
    {
        return Regex.Replace(source, "<.*?>", string.Empty);
    }


To extract a number of characters from the source string, we can extend the function as following.

    /// 
    /// Strip out html tags from text and return extract from it
    /// 
    /// Source string
    /// Number of characters to extract
    /// 
    public static string StripTagsFromHtml(string source, int characterCount)
    {
        string stripped = Regex.Replace(source, "<.*?>", string.Empty);
        if (stripped.Length <= characterCount)
            return stripped;
        else
            return stripped.Substring(0, characterCount);
    }

Happy programming!

Shout it

0 comments:

Post a Comment

Hope you liked this post. You can leave your message or you can put your valuable suggestions on this post here. Thanks for the sharing and cooperation!

Popular Posts

Recent Articles