Working with websites you often need to strip out HTML tags, tag attributes or the complete contents of a HTML tag from some text. Regular expressions can make this very easy and so we thought we would share some that we use all the time.
Find Html Tags
<.*?>
This expression will find all HTML starting and closing tags with or without attributes and so can allow you to strip out all HTML tags from an input string.
Find HTML Tag and Content
<head.*?>(.|\n)*?</head>
With this expression we are searching for an opening and closing <head> tag. This expression gives us the option to remove the complete <head> section from a document.
Using the Regular Expressions
The following C# code uses the second regular expressions to remove the <head> tag from the HTML content and replace it with an empty string:
using System.Text.RegularExpressions; ... string content = "<html><head><title>Using Regular Expressions</title></head><body><h1>Using Regular Expressions</h1><p>Regular expressions are really quite powereful and can make replacing HTML really easy."; string pattern = "<head.*?>(.|\n)*?</head>"; string replacedContet = Regex.Replace(content, pattern, string.Empty);
To remove all HTML attributes from some HTML you could use the first regular expression and a MatchEvaluator
:
string content = "<div clas="a-class" id="an-id">Strip <em style="color:#0f0">any</em> HTML attributes from this content</div>";
string pattern = "<.*?>";
string filteredContent = System.Text.RegularExpressions.Regex.Replace(dirtyString, pattern, delegate(System.Text.RegularExpressions.Match match)
{
// called for each time there is a match
string m = match.ToString();
// now replace anything after the first space
int spacePosition = m.IndexOf(" ");
if (spacePosition >= 0)
{
return m.Substring(0, spacePosition) + ">";
}
else
{
return m;
}
});
4 Comments
Help me how to replace text between specified tags
For example replace between these tags
Hi Raj. Your tags look to have been stripped out. Send me an email with an example and I’ll have a look at it for you.
Thanks for your replay
I mailed you my request
Thanks so much for this post. Finally it resolved my issue 🙂