Return-Path: Received: from pobox.devel.redhat.com ([unix socket]) by pobox.devel.redhat.com (Cyrus v2.2.12-Invoca-RPM-2.2.12-8.1.RHEL4) with LMTPA; Mon, 30 Jun 2008 16:55:27 -0400 X-Sieve: CMU Sieve 2.2 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by pobox.devel.redhat.com (8.13.1/8.13.1) with ESMTP id m5UKtRIl021320 for ; Mon, 30 Jun 2008 16:55:27 -0400 Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m5UKtQfB007031; Mon, 30 Jun 2008 16:55:26 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id m5UKtF1i017377; Mon, 30 Jun 2008 16:55:16 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 30 Jun 2008 13:53:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.27,728,1204531200"; d="scan'208";a="583595535" Received: from linux-os.sc.intel.com ([172.25.110.8]) by fmsmga001.fm.intel.com with ESMTP; 30 Jun 2008 13:56:25 -0700 Received: by linux-os.sc.intel.com (Postfix, from userid 47009) id 7B79C28006; Mon, 30 Jun 2008 13:55:10 -0700 (PDT) Date: Mon, 30 Jun 2008 13:55:10 -0700 From: Venki Pallipadi To: Rik van Riel Cc: "Pallipadi, Venkatesh" , Owen Taylor , "Van De Ven, Arjan" Subject: Re: cpufreq & wrong ACPI info? Message-ID: <20080630205510.GA3492@linux-os.sc.intel.com> References: <20080626203539.42fa1d1b@bree.surriel.com> <1214582755.2942.0.camel@localhost.localdomain> <7E82351C108FA840AB1866AC776AEC46065EEF06@orsmsx505.amr.corp.intel.com> <20080627142941.09d0cfc9@bree.surriel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080627142941.09d0cfc9@bree.surriel.com> User-Agent: Mutt/1.4.1i X-RedHat-Spam-Score: -3.896 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.32 Content-Transfer-Encoding: 7bit On Fri, Jun 27, 2008 at 11:29:41AM -0700, Rik van Riel wrote: > On Fri, 27 Jun 2008 10:08:25 -0700 > "Pallipadi, Venkatesh" wrote: > > > BIOS has the P-state transition latency as 500 uS in ACPI _PSS table. > > That is the reason the default sampling perion in ondemand is getting > > calculated as 500 mS. > > How do you extract that info from the acpi dump? > > We have observed some other cpufreq issues that may be > related and I suspect I should take a census of the > systems in our lab to figure out how common a bad BIOS > is... > > > On this platform, with MSR based frequency switching the real transition > > latency should be ~ 10 uS and ondemand sampling period should be 20 mS > > (for HZ = 1000). > > I know upstream is moving away from the sampling period > alltogether. If bad BIOSes are common, we're going to > have to come up with some thing to do for RHEL. > > Maybe key the value off of CPUID, or simply cap the > default sampling period to a value that should work on > every moderately recent CPU, in case the BIOS indicates > a suspiciously high value. > > > Owen: Can you just check whether you have the latest BIOS on this box. > > I am not sure whether we can get the BIOS folks to change this at this > > point as this is a pretty old platform. May be time for another kernel > > boot parameter and dmi blacklist? > > Either that, or simply do not believe the BIOS when it > indicates a stupidly high value. Could capping the > minimum sampling period to 25ms be good enough for all > CPUs produced in the last 5 years? :) > Below is the patch to cap-off the latency. Let me know if this resolves the performance issue you were seeing. Will send out the patch to mailing list then. Thanks, Venki --- Some BIOSes report very high frequency transition latency which are plainly wrong on CPus that can change frequency using native MSR interface. One such system is IBM T42 (2327-8ZU) as reported by Owen Taylor and Rik van Riel. cpufreq_ondemand driver uses this transition latency to come up with a reasonable sampling interval to sample CPU usage and with such high latency value, ondemand sampling interval ends up being very high (0.5 sec, in this particular case), resulting in performance impact due to slow response to increasing frequency. Fix it by capping-off the transition latency to 20 uS for native MSR based frequency transitions. Signed-off-by: Venkatesh Pallipadi --- arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) Index: linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c 2008-05-02 09:45:23.000000000 -0700 +++ linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c 2008-06-30 12:08:32.000000000 -0700 @@ -659,6 +659,18 @@ static int acpi_cpufreq_cpu_init(struct perf->states[i].transition_latency * 1000; } + /* Check for high latency (>20 uS) from buggy BIOSes, like on T42 */ + if (perf->control_register.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE && + policy->cpuinfo.transition_latency > 20 * 1000) { + static int print_once; + policy->cpuinfo.transition_latency = 20 * 1000; + if (!print_once) { + print_once = 1; + printk(KERN_INFO "Capping-off P-state transition latency" + " at 20 uS\n"); + } + } + data->max_freq = perf->states[0].core_frequency * 1000; /* table init */ for (i=0; istate_count; i++) {