C#/.NET-最小并行化的Quicksort导致性能下降
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了C#/.NET-最小并行化的Quicksort导致性能下降,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5329字,纯文字阅读大概需要8分钟。
内容图文
![C#/.NET-最小并行化的Quicksort导致性能下降](/upload/InfoBanner/zyjiaocheng/685/fd47aa432c6940f1ab3880ae30c5ac6a.jpg)
我目前正在为List类开发递归并行的Quicksort扩展功能.下面的代码代表了我所考虑的最基本的线程分配标准,因为从概念上讲,它应该是最简单的.它分支到检测到的处理器数量的以2为底的对数的深度,并从那里顺序进行.因此,每个CPU应该获得一个线程,该线程具有(大致)相等的大量数据要处理,从而避免了过多的开销时间.提供了基本的顺序算法进行比较.
public static class Quicksort
{
/// <summary>
/// Helper class to hold information about when to parallelize
/// </summary>
/// <attribute name="maxThreads">Maximum number of supported threads</attribute>
/// <attribute name="threadDepth">The depth to which new threads should
/// automatically be made</attribute>
private class ThreadInfo
{
internal int maxThreads;
internal int threadDepth;
public ThreadInfo(int length)
{
maxThreads = Environment.ProcessorCount;
threadDepth = (int)Math.Log(maxThreads, 2);
}
}
/// <summary>
/// Helper function to perform the partitioning step of quicksort
/// </summary>
/// <param name="list">The list to partition</param>
/// <param name="start">The starting index</param>
/// <param name="end">The ending index/param>
/// <returns>The final index of the pivot</returns>
public static int Partition<T>(this List<T> list, int start, int end) where T: IComparable
{
int middle = (int)(start + end) / 2;
// Swap pivot and first item.
var temp = list[start];
list[start] = list[middle];
list[middle] = temp;
var pivot = list[start];
var swapPtr = start;
for (var cursor = start + 1; cursor <= end; cursor++)
{
if (list[cursor].CompareTo(pivot) < 0)
{
// Swap cursor element and designated swap element
temp = list[cursor];
list[cursor] = list[++swapPtr];
list[swapPtr] = temp;
}
}
// Swap pivot with final lower item
temp = list[start];
list[start] = list[swapPtr];
list[swapPtr] = temp;
return swapPtr;
}
/// <summary>
/// Method to begin parallel quicksort algorithm on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
public static void QuicksortParallel<T>(this List<T> list) where T : IComparable
{
if (list.Count < 2048)
list.QuicksortSequential();
else
{
var info = new ThreadInfo(list.Count);
list.QuicksortRecurseP(0, list.Count - 1, 0, info);
}
}
/// <summary>
/// Method to implement parallel quicksort recursion on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
/// <param name="start">The starting index of the partition</param>
/// <param name="end">The ending index of the partition (inclusive)</param>
/// <param name="depth">The current recursive depth</param>
/// <param name="info">Structure holding decision-making info for threads</param>
private static void QuicksortRecurseP<T>(this List<T> list, int start, int end, int depth,
ThreadInfo info)
where T : IComparable
{
if (start >= end)
return;
int middle = list.Partition(start, end);
if (depth < info.threadDepth)
{
var t = Task.Run(() =>
{
list.QuicksortRecurseP(start, middle - 1, depth + 1, info);
});
list.QuicksortRecurseP(middle + 1, end, depth + 1, info);
t.Wait();
}
else
{
list.QuicksortRecurseS(start, middle - 1);
list.QuicksortRecurseS(middle + 1, end);
}
}
/// <summary>
/// Method to begin sequential quicksort algorithm on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
public static void QuicksortSequential<T>(this List<T> list) where T : IComparable
{
list.QuicksortRecurseS(0, list.Count - 1);
}
/// <summary>
/// Method to implement sequential quicksort recursion on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
/// <param name="start">The starting index of the partition</param>
/// <param name="end">The ending index of the partition (inclusive)</param>
private static void QuicksortRecurseS<T>(this List<T> list, int start, int end) where T : IComparable
{
if (start >= end)
return;
int middle = list.Partition(start, end);
// Now recursively sort the (approximate) halves.
list.QuicksortRecurseS(start, middle - 1);
list.QuicksortRecurseS(middle + 1, end);
}
}
据我了解,这种方法应该产生一次性的启动成本,然后以比顺序方法更快的速度对其余数据进行排序.但是,并行方法比顺序方法要花费更多的时间,顺序方法实际上会随着负载的增加而增加.以4核CPU上的一千万个项目为基准,顺序方法平均需要18秒才能完成,而并行方法最多需要26秒.增加允许的螺纹深度会迅速加剧该问题.
感谢您在寻找性能猪方面的帮助.谢谢!
解决方法:
问题是CPU缓存冲突,也称为“错误共享”.
除非枢轴点的内存地址恰好在cache line上,否则其中一个线程将在L1或L2缓存上获得锁,而另一个线程将不得不等待.这可能使性能甚至比串行解决方案还要差.该问题在this article中得到了很好的描述:
…where threads use different objects but those objects happen to be close enough in memory that they fall on the same cache line, and the cache system treats them as a single lump that is effectively protected by a hardware write lock that only one core can hold at a time. [1,2] This causes real but invisible performance contention; whichever thread currently has exclusive ownership so that it can physically perform an update to the cache line will silently throttle other threads that are trying to use different (but, alas, nearby) data that sits on the same line.
(snip)
In most cases, the parallel code ran actually ran slower than the sequential code, and in no case did we get any better than a 42% speedup no matter how many cores we threw at the problem.
抓稻草….如果将列表分成两个对象,然后将其替换为pin them in memory(甚至只是将它们填充在结构中),则可能会做得更好,因此它们之间的距离足够远,因此不会发生缓存冲突.
内容总结
以上是互联网集市为您收集整理的C#/.NET-最小并行化的Quicksort导致性能下降全部内容,希望文章能够帮你解决C#/.NET-最小并行化的Quicksort导致性能下降所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。