Tuesday, January 29, 2008

Download Web Content other than using WebClient class

Keywords: WebClient, UTF-8, WebRequest


从一个URI上下载网页Content或Xml, 可以直接使用WebClient类(在System.Net命名空间), 非常简单. 它有很多Download方法, 比如DownloadString()和DownloadFile(). 这些都是同步方法, 当然它也有异步Download方法, 不过经常用的是它的同步方法.


但在用WebClient.DownloadString()一个以UTF-8编码格式存盘的一个Xml文件时, 得到的Xml字符串有乱码, 不管怎样设置WebClient的Encode属性都没有用处, 可能是WebClient的一个bug吧.


可以WebRequest方法来实现同步DownloadString的功能. 代码如下:




public void DownloadAndLoadXml(string uri, XmlDocument doc)
{
System.Net.WebRequest request = System.Net.WebRequest.Create(uri);
using (System.Net.WebResponse response=request.GetResponse())
using (System.IO.Stream stream=response.GetResponseStream())
{
if (stream.Length > 0)
{
stream.Position = 0;
doc.Load(stream);
}
}
}



/// <summary>
/// download the web page from the given uri
/// </summary>
/// <param name="uri"></param>
/// <returns></returns>
public string DownloadStringFromWeb(string uri)
{
System.Net.WebRequest request = System.Net.WebRequest.Create(uri);
using (System.Net.WebResponse response = request.GetResponse())
using (System.IO.Stream stream = response.GetResponseStream())
{
System.IO.StreamReader reader = new StreamReader(stream);
//save all info from stream into a string variable
string result=reader.ReadToEnd();
reader.Close();
return result;
}
}

No comments: