1. 22 8月, 2023 1 次提交
  2. 08 3月, 2023 1 次提交
  3. 13 12月, 2022 1 次提交
  4. 20 9月, 2022 1 次提交
    • D
      x86/speculation: Add RSB VM Exit protections · 3838336f
      Daniel Sneddon 提交于
      stable inclusion
      from stable-v5.10.136
      commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO
      CVE: CVE-2022-26373
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      
      --------------------------------
      
      commit 2b129932 upstream.
      
      tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
      documented for RET instructions after VM exits. Mitigate it with a new
      one-entry RSB stuffing mechanism and a new LFENCE.
      
      == Background ==
      
      Indirect Branch Restricted Speculation (IBRS) was designed to help
      mitigate Branch Target Injection and Speculative Store Bypass, i.e.
      Spectre, attacks. IBRS prevents software run in less privileged modes
      from affecting branch prediction in more privileged modes. IBRS requires
      the MSR to be written on every privilege level change.
      
      To overcome some of the performance issues of IBRS, Enhanced IBRS was
      introduced.  eIBRS is an "always on" IBRS, in other words, just turn
      it on once instead of writing the MSR on every privilege level change.
      When eIBRS is enabled, more privileged modes should be protected from
      less privileged modes, including protecting VMMs from guests.
      
      == Problem ==
      
      Here's a simplification of how guests are run on Linux' KVM:
      
      void run_kvm_guest(void)
      {
      	// Prepare to run guest
      	VMRESUME();
      	// Clean up after guest runs
      }
      
      The execution flow for that would look something like this to the
      processor:
      
      1. Host-side: call run_kvm_guest()
      2. Host-side: VMRESUME
      3. Guest runs, does "CALL guest_function"
      4. VM exit, host runs again
      5. Host might make some "cleanup" function calls
      6. Host-side: RET from run_kvm_guest()
      
      Now, when back on the host, there are a couple of possible scenarios of
      post-guest activity the host needs to do before executing host code:
      
      * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
      touched and Linux has to do a 32-entry stuffing.
      
      * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
      IBRS=1 shortly after VM exit, has a documented side effect of flushing
      the RSB except in this PBRSB situation where the software needs to stuff
      the last RSB entry "by hand".
      
      IOW, with eIBRS supported, host RET instructions should no longer be
      influenced by guest behavior after the host retires a single CALL
      instruction.
      
      However, if the RET instructions are "unbalanced" with CALLs after a VM
      exit as is the RET in #6, it might speculatively use the address for the
      instruction after the CALL in #3 as an RSB prediction. This is a problem
      since the (untrusted) guest controls this address.
      
      Balanced CALL/RET instruction pairs such as in step #5 are not affected.
      
      == Solution ==
      
      The PBRSB issue affects a wide variety of Intel processors which
      support eIBRS. But not all of them need mitigation. Today,
      X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
      PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
      eIBRS systems which enable legacy IBRS explicitly.
      
      However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
      and most of them need a new mitigation.
      
      Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
      which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
      
      The lighter-weight mitigation performs a CALL instruction which is
      immediately followed by a speculative execution barrier (INT3). This
      steers speculative execution to the barrier -- just like a retpoline
      -- which ensures that speculation can never reach an unbalanced RET.
      Then, ensure this CALL is retired before continuing execution with an
      LFENCE.
      
      In other words, the window of exposure is opened at VM exit where RET
      behavior is troublesome. While the window is open, force RSB predictions
      sampling for RET targets to a dead end at the INT3. Close the window
      with the LFENCE.
      
      There is a subset of eIBRS systems which are not vulnerable to PBRSB.
      Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
      Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
      
        [ bp: Massage, incorporate review comments from Andy Cooper. ]
      Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      conflict:
          arch/x86/include/asm/cpufeatures.h
      Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3838336f
  5. 16 9月, 2022 5 次提交
  6. 05 7月, 2022 2 次提交
  7. 06 12月, 2021 1 次提交
  8. 15 11月, 2021 1 次提交
  9. 03 6月, 2021 1 次提交
  10. 01 10月, 2020 1 次提交
  11. 22 9月, 2020 1 次提交
  12. 09 9月, 2020 2 次提交
  13. 15 8月, 2020 1 次提交
  14. 06 8月, 2020 1 次提交
    • P
      locking/seqlock, headers: Untangle the spaghetti monster · 0cd39f46
      Peter Zijlstra 提交于
      By using lockdep_assert_*() from seqlock.h, the spaghetti monster
      attacked.
      
      Attack back by reducing seqlock.h dependencies from two key high level headers:
      
       - <linux/seqlock.h>:               -Remove <linux/ww_mutex.h>
       - <linux/time.h>:                  -Remove <linux/seqlock.h>
       - <linux/sched.h>:                 +Add    <linux/seqlock.h>
      
      The price was to add it to sched.h ...
      
      Core header fallout, we add direct header dependencies instead of gaining them
      parasitically from higher level headers:
      
       - <linux/dynamic_queue_limits.h>:  +Add <asm/bug.h>
       - <linux/hrtimer.h>:               +Add <linux/seqlock.h>
       - <linux/ktime.h>:                 +Add <asm/bug.h>
       - <linux/lockdep.h>:               +Add <linux/smp.h>
       - <linux/sched.h>:                 +Add <linux/seqlock.h>
       - <linux/videodev2.h>:             +Add <linux/kernel.h>
      
      Arch headers fallout:
      
       - PARISC: <asm/timex.h>:           +Add <asm/special_insns.h>
       - SH:     <asm/io.h>:              +Add <asm/page.h>
       - SPARC:  <asm/timer_64.h>:        +Add <uapi/asm/asi.h>
       - SPARC:  <asm/vvar.h>:            +Add <asm/processor.h>, <asm/barrier.h>
                                          -Remove <linux/seqlock.h>
       - X86:    <asm/fixmap.h>:          +Add <asm/pgtable_types.h>
                                          -Remove <asm/acpi.h>
      
      There's also a bunch of parasitic header dependency fallout in .c files, not listed
      separately.
      
      [ mingo: Extended the changelog, split up & fixed the original patch. ]
      Co-developed-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
      0cd39f46
  15. 18 6月, 2020 4 次提交
  16. 11 6月, 2020 2 次提交
  17. 10 6月, 2020 2 次提交
    • M
      mm: reorder includes after introduction of linux/pgtable.h · 65fddcfc
      Mike Rapoport 提交于
      The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
      of the latter in the middle of asm includes.  Fix this up with the aid of
      the below script and manual adjustments here and there.
      
      	import sys
      	import re
      
      	if len(sys.argv) is not 3:
      	    print "USAGE: %s <file> <header>" % (sys.argv[0])
      	    sys.exit(1)
      
      	hdr_to_move="#include <linux/%s>" % sys.argv[2]
      	moved = False
      	in_hdrs = False
      
      	with open(sys.argv[1], "r") as f:
      	    lines = f.readlines()
      	    for _line in lines:
      		line = _line.rstrip('
      ')
      		if line == hdr_to_move:
      		    continue
      		if line.startswith("#include <linux/"):
      		    in_hdrs = True
      		elif not moved and in_hdrs:
      		    moved = True
      		    print hdr_to_move
      		print line
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65fddcfc
    • M
      mm: introduce include/linux/pgtable.h · ca5999fd
      Mike Rapoport 提交于
      The include/linux/pgtable.h is going to be the home of generic page table
      manipulation functions.
      
      Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
      make the latter include asm/pgtable.h.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5999fd
  18. 06 5月, 2020 3 次提交
  19. 27 4月, 2020 1 次提交
  20. 25 4月, 2020 1 次提交
  21. 20 4月, 2020 2 次提交
  22. 25 3月, 2020 1 次提交
  23. 28 2月, 2020 1 次提交
    • S
      x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes · 735a6dd0
      Sean Christopherson 提交于
      Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling
      get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE.
      Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap()
      changes that were made between this_cpu->c_init() and setup_pku(), as
      all non-synthetic feature words are reinitialized from the CPU's CPUID
      values.
      
      Blasting away capability updates manifests most visibility when running
      on a VMX capable CPU, but with VMX disabled by BIOS.  To indicate that
      VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using
      clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report
      which CPU is misconfigured (KVM needs to probe every CPU anyways).
      Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled,
      ultimately leading to an unexpected #GP when KVM attempts to do VMXON.
      
      Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let
      KVM figure out a different way to report the misconfigured CPU, but VMX
      is not the only feature bit that is affected, i.e. there is precedent
      that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init()
      is expected to work.  Most notably, x86_init_rdrand()'s clearing of
      X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten.
      
      Fixes: 06976945 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU")
      Reported-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Tested-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
      735a6dd0
  24. 21 2月, 2020 1 次提交
    • P
      x86/split_lock: Enable split lock detection by kernel · 6650cdd9
      Peter Zijlstra (Intel) 提交于
      A split-lock occurs when an atomic instruction operates on data that spans
      two cache lines. In order to maintain atomicity the core takes a global bus
      lock.
      
      This is typically >1000 cycles slower than an atomic operation within a
      cache line. It also disrupts performance on other cores (which must wait
      for the bus lock to be released before their memory operations can
      complete). For real-time systems this may mean missing deadlines. For other
      systems it may just be very annoying.
      
      Some CPUs have the capability to raise an #AC trap when a split lock is
      attempted.
      
      Provide a command line option to give the user choices on how to handle
      this:
      
      split_lock_detect=
      	off	- not enabled (no traps for split locks)
      	warn	- warn once when an application does a
      		  split lock, but allow it to continue
      		  running.
      	fatal	- Send SIGBUS to applications that cause split lock
      
      On systems that support split lock detection the default is "warn". Note
      that if the kernel hits a split lock in any mode other than "off" it will
      OOPs.
      
      One implementation wrinkle is that the MSR to control the split lock
      detection is per-core, not per thread. This might result in some short
      lived races on HT systems in "warn" mode if Linux tries to enable on one
      thread while disabling on the other. Race analysis by Sean Christopherson:
      
        - Toggling of split-lock is only done in "warn" mode.  Worst case
          scenario of a race is that a misbehaving task will generate multiple
          #AC exceptions on the same instruction.  And this race will only occur
          if both siblings are running tasks that generate split-lock #ACs, e.g.
          a race where sibling threads are writing different values will only
          occur if CPUx is disabling split-lock after an #AC and CPUy is
          re-enabling split-lock after *its* previous task generated an #AC.
        - Transitioning between off/warn/fatal modes at runtime isn't supported
          and disabling is tracked per task, so hardware will always reach a steady
          state that matches the configured mode.  I.e. split-lock is guaranteed to
          be enabled in hardware once all _TIF_SLD threads have been scheduled out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Co-developed-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
      6650cdd9
  25. 24 1月, 2020 1 次提交
    • D
      x86/mpx: remove MPX from arch/x86 · 45fc24e8
      Dave Hansen 提交于
      From: Dave Hansen <dave.hansen@linux.intel.com>
      
      MPX is being removed from the kernel due to a lack of support
      in the toolchain going forward (gcc).
      
      This removes all the remaining (dead at this point) MPX handling
      code remaining in the tree.  The only remaining code is the XSAVE
      support for MPX state which is currently needd for KVM to handle
      VMs which might use MPX.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      45fc24e8
  26. 18 1月, 2020 1 次提交