Monday, September 25, 2017

Xây dựng lớp tĩnh HtmlRemoval bằng C#


Bây h mình sẽ hướng dẫn các bạn Xây dựng lớp tĩnh HtmlRemoval bằng C# giúp các bạn học lập trình tốt hơn
Trích dẫn:
using System;
using System.Text.RegularExpressions;

/// <summary>
/// Methods to remove HTML from strings.
/// </summary>
public static class HtmlRemoval
{
/// <summary>
/// Remove HTML from string with Regex.
/// </summary>
public static string StripTagsRegex(string source)
{
return Regex.Replace(source, "<.*?>", string.Empty);
}

/// <summary>
/// Compiled regular expression for performance.
/// </summary>
static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);

/// <summary>
/// Remove HTML from string with compiled Regex.
/// </summary>
public static string StripTagsRegexCompiled(string source)
{
return _htmlRegex.Replace(source, string.Empty);
}

/// <summary>
/// Remove HTML tags from string using char array.
/// </summary>
public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;

for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
inside = true;
continue;
}
if (let == '>')
{
inside = false;
continue;
}
if (!inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
return new string(array, 0, arrayIndex);
}
}

Viết chương trình để kiểm tra các thủ tục trên như sau:


Program that tests HTML removal: C#
Trích dẫn:
using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
const string html = "<p>There was a <b>.NET</b> programmer " +
"and he stripped the <i>HTML</i> tags.</p>";

Console.WriteLine(HtmlRemoval.StripTagsRegex(html));
Console.WriteLine(HtmlRemoval.StripTagsRegexCompiled(html));
Console.WriteLine(HtmlRemoval.StripTagsCharArray(html));
}
}

Output

There was a .NET programmer and he stripped the HTML tags.
There was a .NET programmer and he stripped the HTML tags.
There was a .NET programmer and he stripped the HTML tags.


Kiểm tra tốc độ xử lý của 3 thủ tục trên

Removing HTML tags from strings

Input: <p>The <b>dog</b> is <i>cute</i>.</p>
Output: The dog is cute.

Performance test for HTML removal

HtmlRemoval.StripTagsRegex: 2404 ms
HtmlRemoval.StripTagsRegexCompiled: 1366 ms
HtmlRemoval.StripTagsCharArray: 287 ms [fastest]

File length test for HTML removal

File length before: 8085 chars
HtmlRemoval.StripTagsRegex: 4382 chars
HtmlRemoval.StripTagsRegexCompiled: 4382 chars
HtmlRemoval.StripTagsCharArray: 4382 chars

Chúc các bạn thành công!

No comments:

Post a Comment