1.27.0[−][src]Module core::arch::x86
Platformspecific intrinsics for the x86
platform.
See the module documentation for more details.
Structs
CpuidResult  x86 Result of the 
__m128i  x86 128bit wide integer vector type, x86specific 
__m128  x86 128bit wide set of four 
__m128d  x86 128bit wide set of two 
__m256i  x86 256bit wide integer vector type, x86specific 
__m256  x86 256bit wide set of eight 
__m256d  x86 256bit wide set of four 
__m64  Experimentalx86 64bit wide integer vector type, x86specific 
__m512i  Experimentalx86 512bit wide integer vector type, x86specific 
__m512  Experimentalx86 512bit wide set of sixteen 
__m512d  Experimentalx86 512bit wide set of eight 
Constants
_CMP_EQ_OQ  x86 Equal (ordered, nonsignaling) 
_CMP_EQ_OS  x86 Equal (ordered, signaling) 
_CMP_EQ_UQ  x86 Equal (unordered, nonsignaling) 
_CMP_EQ_US  x86 Equal (unordered, signaling) 
_CMP_FALSE_OQ  x86 False (ordered, nonsignaling) 
_CMP_FALSE_OS  x86 False (ordered, signaling) 
_CMP_GE_OQ  x86 Greaterthanorequal (ordered, nonsignaling) 
_CMP_GE_OS  x86 Greaterthanorequal (ordered, signaling) 
_CMP_GT_OQ  x86 Greaterthan (ordered, nonsignaling) 
_CMP_GT_OS  x86 Greaterthan (ordered, signaling) 
_CMP_LE_OQ  x86 Lessthanorequal (ordered, nonsignaling) 
_CMP_LE_OS  x86 Lessthanorequal (ordered, signaling) 
_CMP_LT_OQ  x86 Lessthan (ordered, nonsignaling) 
_CMP_LT_OS  x86 Lessthan (ordered, signaling) 
_CMP_NEQ_OQ  x86 Notequal (ordered, nonsignaling) 
_CMP_NEQ_OS  x86 Notequal (ordered, signaling) 
_CMP_NEQ_UQ  x86 Notequal (unordered, nonsignaling) 
_CMP_NEQ_US  x86 Notequal (unordered, signaling) 
_CMP_NGE_UQ  x86 Notgreaterthanorequal (unordered, nonsignaling) 
_CMP_NGE_US  x86 Notgreaterthanorequal (unordered, signaling) 
_CMP_NGT_UQ  x86 Notgreaterthan (unordered, nonsignaling) 
_CMP_NGT_US  x86 Notgreaterthan (unordered, signaling) 
_CMP_NLE_UQ  x86 Notlessthanorequal (unordered, nonsignaling) 
_CMP_NLE_US  x86 Notlessthanorequal (unordered, signaling) 
_CMP_NLT_UQ  x86 Notlessthan (unordered, nonsignaling) 
_CMP_NLT_US  x86 Notlessthan (unordered, signaling) 
_CMP_ORD_Q  x86 Ordered (nonsignaling) 
_CMP_ORD_S  x86 Ordered (signaling) 
_CMP_TRUE_UQ  x86 True (unordered, nonsignaling) 
_CMP_TRUE_US  x86 True (unordered, signaling) 
_CMP_UNORD_Q  x86 Unordered (nonsignaling) 
_CMP_UNORD_S  x86 Unordered (signaling) 
_MM_EXCEPT_DENORM  x86 See 
_MM_EXCEPT_DIV_ZERO  x86 See 
_MM_EXCEPT_INEXACT  x86 See 
_MM_EXCEPT_INVALID  x86 See 
_MM_EXCEPT_MASK  x86 
_MM_EXCEPT_OVERFLOW  x86 See 
_MM_EXCEPT_UNDERFLOW  x86 See 
_MM_FLUSH_ZERO_MASK  x86 
_MM_FLUSH_ZERO_OFF  x86 See 
_MM_FLUSH_ZERO_ON  x86 See 
_MM_FROUND_CEIL  x86 round up and do not suppress exceptions 
_MM_FROUND_CUR_DIRECTION  x86 use MXCSR.RC; see 
_MM_FROUND_FLOOR  x86 round down and do not suppress exceptions 
_MM_FROUND_NEARBYINT  x86 use MXCSR.RC and suppress exceptions; see 
_MM_FROUND_NINT  x86 round to nearest and do not suppress exceptions 
_MM_FROUND_NO_EXC  x86 suppress exceptions 
_MM_FROUND_RAISE_EXC  x86 do not suppress exceptions 
_MM_FROUND_RINT  x86 use MXCSR.RC and do not suppress exceptions; see

_MM_FROUND_TO_NEAREST_INT  x86 round to nearest 
_MM_FROUND_TO_NEG_INF  x86 round down 
_MM_FROUND_TO_POS_INF  x86 round up 
_MM_FROUND_TO_ZERO  x86 truncate 
_MM_FROUND_TRUNC  x86 truncate and do not suppress exceptions 
_MM_HINT_NTA  x86 See 
_MM_HINT_T0  x86 See 
_MM_HINT_T1  x86 See 
_MM_HINT_T2  x86 See 
_MM_MASK_DENORM  x86 See 
_MM_MASK_DIV_ZERO  x86 See 
_MM_MASK_INEXACT  x86 See 
_MM_MASK_INVALID  x86 See 
_MM_MASK_MASK  x86 
_MM_MASK_OVERFLOW  x86 See 
_MM_MASK_UNDERFLOW  x86 See 
_MM_ROUND_DOWN  x86 See 
_MM_ROUND_MASK  x86 
_MM_ROUND_NEAREST  x86 See 
_MM_ROUND_TOWARD_ZERO  x86 See 
_MM_ROUND_UP  x86 See 
_SIDD_BIT_MASK  x86 Mask only: return the bit mask 
_SIDD_CMP_EQUAL_ANY  x86 For each character in 
_SIDD_CMP_EQUAL_EACH  x86 The strings defined by 
_SIDD_CMP_EQUAL_ORDERED  x86 Search for the defined substring in the target 
_SIDD_CMP_RANGES  x86 For each character in 
_SIDD_LEAST_SIGNIFICANT  x86 Index only: return the least significant bit (Default) 
_SIDD_MASKED_NEGATIVE_POLARITY  x86 Negate results only before the end of the string 
_SIDD_MASKED_POSITIVE_POLARITY  x86 Do not negate results before the end of the string 
_SIDD_MOST_SIGNIFICANT  x86 Index only: return the most significant bit 
_SIDD_NEGATIVE_POLARITY  x86 Negate results 
_SIDD_POSITIVE_POLARITY  x86 Do not negate results (Default) 
_SIDD_SBYTE_OPS  x86 String contains signed 8bit characters 
_SIDD_SWORD_OPS  x86 String contains unsigned 16bit characters 
_SIDD_UBYTE_OPS  x86 String contains unsigned 8bit characters (Default) 
_SIDD_UNIT_MASK  x86 Mask only: return the byte mask 
_SIDD_UWORD_OPS  x86 String contains unsigned 16bit characters 
_XCR_XFEATURE_ENABLED_MASK  x86

Functions
_MM_GET_EXCEPTION_MASK^{⚠}  x86 and sse See 
_MM_GET_EXCEPTION_STATE^{⚠}  x86 and sse See 
_MM_GET_FLUSH_ZERO_MODE^{⚠}  x86 and sse See 
_MM_GET_ROUNDING_MODE^{⚠}  x86 and sse See 
_MM_SET_EXCEPTION_MASK^{⚠}  x86 and sse See 
_MM_SET_EXCEPTION_STATE^{⚠}  x86 and sse See 
_MM_SET_FLUSH_ZERO_MODE^{⚠}  x86 and sse See 
_MM_SET_ROUNDING_MODE^{⚠}  x86 and sse See 
_MM_TRANSPOSE4_PS^{⚠}  x86 and sse Transpose the 4x4 matrix formed by 4 rows of __m128 in place. 
__cpuid^{⚠}  x86 See 
__cpuid_count^{⚠}  x86 Returns the result of the 
__get_cpuid_max^{⚠}  x86 Returns the highestsupported 
__rdtscp^{⚠}  x86 Reads the current value of the processor’s timestamp counter and
the 
_addcarry_u32^{⚠}  x86 Add unsigned 32bit integers a and b with unsigned 8bit carryin 
_addcarryx_u32^{⚠}  x86 and adx Add unsigned 32bit integers a and b with unsigned 8bit carryin 
_andn_u32^{⚠}  x86 and bmi1 Bitwise logical 
_bextr2_u32^{⚠}  x86 and bmi1 Extracts bits of 
_bextr_u32^{⚠}  x86 and bmi1 Extracts bits in range [ 
_blcfill_u32^{⚠}  x86 and tbm Clears all bits below the least significant zero bit of 
_blcfill_u64^{⚠}  x86 and tbm Clears all bits below the least significant zero bit of 
_blci_u32^{⚠}  x86 and tbm Sets all bits of 
_blci_u64^{⚠}  x86 and tbm Sets all bits of 
_blcic_u32^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blcic_u64^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blcmsk_u32^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blcmsk_u64^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blcs_u32^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blcs_u64^{⚠}  x86 and tbm Sets the least significant zero bit of 
_blsfill_u32^{⚠}  x86 and tbm Sets all bits of 
_blsfill_u64^{⚠}  x86 and tbm Sets all bits of 
_blsi_u32^{⚠}  x86 and bmi1 Extract lowest set isolated bit. 
_blsic_u32^{⚠}  x86 and tbm Clears least significant bit and sets all other bits. 
_blsic_u64^{⚠}  x86 and tbm Clears least significant bit and sets all other bits. 
_blsmsk_u32^{⚠}  x86 and bmi1 Get mask up to lowest set bit. 
_blsr_u32^{⚠}  x86 and bmi1 Resets the lowest set bit of 
_bswap^{⚠}  x86 Return an integer with the reversed byte order of x 
_bzhi_u32^{⚠}  x86 and bmi2 Zero higher bits of 
_fxrstor^{⚠}  x86 and fxsr Restores the 
_fxsave^{⚠}  x86 and fxsr Saves the 
_lzcnt_u32^{⚠}  x86 and lzcnt Counts the leading most significant zero bits. 
_mm256_add_pd^{⚠}  x86 and avx Add packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_add_ps^{⚠}  x86 and avx Add packed singleprecision (32bit) floatingpoint elements in 
_mm256_and_pd^{⚠}  x86 and avx Compute the bitwise AND of a packed doubleprecision (64bit)
floatingpoint elements
in 
_mm256_and_ps^{⚠}  x86 and avx Compute the bitwise AND of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_or_pd^{⚠}  x86 and avx Compute the bitwise OR packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_or_ps^{⚠}  x86 and avx Compute the bitwise OR packed singleprecision (32bit) floatingpoint
elements in 
_mm256_shuffle_pd^{⚠}  x86 and avx Shuffle doubleprecision (64bit) floatingpoint elements within 128bit
lanes using the control in 
_mm256_shuffle_ps^{⚠}  x86 and avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_andnot_pd^{⚠}  x86 and avx Compute the bitwise NOT of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_andnot_ps^{⚠}  x86 and avx Compute the bitwise NOT of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_max_pd^{⚠}  x86 and avx Compare packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_max_ps^{⚠}  x86 and avx Compare packed singleprecision (32bit) floatingpoint elements in 
_mm256_min_pd^{⚠}  x86 and avx Compare packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_min_ps^{⚠}  x86 and avx Compare packed singleprecision (32bit) floatingpoint elements in 
_mm256_mul_pd^{⚠}  x86 and avx Multiply packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_mul_ps^{⚠}  x86 and avx Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_addsub_pd^{⚠}  x86 and avx Alternatively add and subtract packed doubleprecision (64bit)
floatingpoint elements in 
_mm256_addsub_ps^{⚠}  x86 and avx Alternatively add and subtract packed singleprecision (32bit)
floatingpoint elements in 
_mm256_sub_pd^{⚠}  x86 and avx Subtract packed doubleprecision (64bit) floatingpoint elements in 
_mm256_sub_ps^{⚠}  x86 and avx Subtract packed singleprecision (32bit) floatingpoint elements in 
_mm256_div_ps^{⚠}  x86 and avx Compute the division of each of the 8 packed 32bit floatingpoint elements
in 
_mm256_div_pd^{⚠}  x86 and avx Compute the division of each of the 4 packed 64bit floatingpoint elements
in 
_mm256_round_pd^{⚠}  x86 and avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_ceil_pd^{⚠}  x86 and avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_floor_pd^{⚠}  x86 and avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_round_ps^{⚠}  x86 and avx Round packed singleprecision (32bit) floating point elements in 
_mm256_ceil_ps^{⚠}  x86 and avx Round packed singleprecision (32bit) floating point elements in 
_mm256_floor_ps^{⚠}  x86 and avx Round packed singleprecision (32bit) floating point elements in 
_mm256_sqrt_ps^{⚠}  x86 and avx Return the square root of packed singleprecision (32bit) floating point
elements in 
_mm256_sqrt_pd^{⚠}  x86 and avx Return the square root of packed doubleprecision (64bit) floating point
elements in 
_mm256_blend_pd^{⚠}  x86 and avx Blend packed doubleprecision (64bit) floatingpoint elements from

_mm256_blend_ps^{⚠}  x86 and avx Blend packed singleprecision (32bit) floatingpoint elements from

_mm256_blendv_pd^{⚠}  x86 and avx Blend packed doubleprecision (64bit) floatingpoint elements from

_mm256_blendv_ps^{⚠}  x86 and avx Blend packed singleprecision (32bit) floatingpoint elements from

_mm256_dp_ps^{⚠}  x86 and avx Conditionally multiply the packed singleprecision (32bit) floatingpoint
elements in 
_mm256_hadd_pd^{⚠}  x86 and avx Horizontal addition of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hadd_ps^{⚠}  x86 and avx Horizontal addition of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_hsub_pd^{⚠}  x86 and avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hsub_ps^{⚠}  x86 and avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_xor_pd^{⚠}  x86 and avx Compute the bitwise XOR of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_xor_ps^{⚠}  x86 and avx Compute the bitwise XOR of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_cmp_pd^{⚠}  x86 and avx Compare packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_cmp_ps^{⚠}  x86 and avx Compare packed singleprecision (32bit) floatingpoint
elements in 
_mm256_cvtpd_ps^{⚠}  x86 and avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtps_pd^{⚠}  x86 and avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_zeroall^{⚠}  x86 and avx Zero the contents of all XMM or YMM registers. 
_mm256_zeroupper^{⚠}  x86 and avx Zero the upper 128 bits of all YMM registers; the lower 128bits of the registers are unmodified. 
_mm256_permutevar_ps^{⚠}  x86 and avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_permute_ps^{⚠}  x86 and avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_permutevar_pd^{⚠}  x86 and avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm256_permute_pd^{⚠}  x86 and avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm256_broadcast_ss^{⚠}  x86 and avx Broadcast a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcast_sd^{⚠}  x86 and avx Broadcast a doubleprecision (64bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcast_ps^{⚠}  x86 and avx Broadcast 128 bits from memory (composed of 4 packed singleprecision (32bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_broadcast_pd^{⚠}  x86 and avx Broadcast 128 bits from memory (composed of 2 packed doubleprecision (64bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_load_pd^{⚠}  x86 and avx Load 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_store_pd^{⚠}  x86 and avx Store 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_load_ps^{⚠}  x86 and avx Load 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_store_ps^{⚠}  x86 and avx Store 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_loadu_pd^{⚠}  x86 and avx Load 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_storeu_pd^{⚠}  x86 and avx Store 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_loadu_ps^{⚠}  x86 and avx Load 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_storeu_ps^{⚠}  x86 and avx Store 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_maskload_pd^{⚠}  x86 and avx Load packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm256_maskstore_pd^{⚠}  x86 and avx Store packed doubleprecision (64bit) floatingpoint elements from 
_mm256_maskload_ps^{⚠}  x86 and avx Load packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm256_maskstore_ps^{⚠}  x86 and avx Store packed singleprecision (32bit) floatingpoint elements from 
_mm256_movehdup_ps^{⚠}  x86 and avx Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_moveldup_ps^{⚠}  x86 and avx Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_movedup_pd^{⚠}  x86 and avx Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from "a", and return the results. 
_mm256_stream_pd^{⚠}  x86 and avx Moves doubleprecision values from a 256bit vector of 
_mm256_stream_ps^{⚠}  x86 and avx Moves singleprecision floating point values from a 256bit vector
of 
_mm256_rcp_ps^{⚠}  x86 and avx Compute the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm256_rsqrt_ps^{⚠}  x86 and avx Compute the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm256_unpackhi_pd^{⚠}  x86 and avx Unpack and interleave doubleprecision (64bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpackhi_ps^{⚠}  x86 and avx Unpack and interleave singleprecision (32bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpacklo_pd^{⚠}  x86 and avx Unpack and interleave doubleprecision (64bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_unpacklo_ps^{⚠}  x86 and avx Unpack and interleave singleprecision (32bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_testz_pd^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testc_pd^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testnzc_pd^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testz_ps^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testc_ps^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testnzc_ps^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_movemask_pd^{⚠}  x86 and avx Set each bit of the returned mask based on the most significant bit of the
corresponding packed doubleprecision (64bit) floatingpoint element in

_mm256_movemask_ps^{⚠}  x86 and avx Set each bit of the returned mask based on the most significant bit of the
corresponding packed singleprecision (32bit) floatingpoint element in

_mm256_setzero_pd^{⚠}  x86 and avx Return vector of type __m256d with all elements set to zero. 
_mm256_setzero_ps^{⚠}  x86 and avx Return vector of type __m256 with all elements set to zero. 
_mm256_set_pd^{⚠}  x86 and avx Set packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_set_ps^{⚠}  x86 and avx Set packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_setr_pd^{⚠}  x86 and avx Set packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_setr_ps^{⚠}  x86 and avx Set packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_castpd_ps^{⚠}  x86 and avx Cast vector of type __m256d to type __m256. 
_mm256_castps_pd^{⚠}  x86 and avx Cast vector of type __m256 to type __m256d. 
_mm256_undefined_ps^{⚠}  x86 and avx Return vector of type 
_mm256_undefined_pd^{⚠}  x86 and avx Return vector of type 
_mm256_broadcastsd_pd^{⚠}  x86 and avx2 Broadcast the low doubleprecision (64bit) floatingpoint element
from 
_mm256_broadcastss_ps^{⚠}  x86 and avx2 Broadcast the low singleprecision (32bit) floatingpoint element
from 
_mm256_fmadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmaddsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmaddsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmsubadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmsubadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_fnmadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fnmadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_fnmsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fnmsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm256_abs_epi8^{⚠}  x86 and avx2 Computes the absolute values of packed 8bit integers in 
_mm256_abs_epi16^{⚠}  x86 and avx2 Computes the absolute values of packed 16bit integers in 
_mm256_abs_epi32^{⚠}  x86 and avx2 Computes the absolute values of packed 32bit integers in 
_mm256_add_epi8^{⚠}  x86 and avx2 Add packed 8bit integers in 
_mm256_add_epi16^{⚠}  x86 and avx2 Add packed 16bit integers in 
_mm256_add_epi32^{⚠}  x86 and avx2 Add packed 32bit integers in 
_mm256_add_epi64^{⚠}  x86 and avx2 Add packed 64bit integers in 
_mm256_adds_epi8^{⚠}  x86 and avx2 Add packed 8bit integers in 
_mm256_adds_epi16^{⚠}  x86 and avx2 Add packed 16bit integers in 
_mm256_adds_epu8^{⚠}  x86 and avx2 Add packed unsigned 8bit integers in 
_mm256_adds_epu16^{⚠}  x86 and avx2 Add packed unsigned 16bit integers in 
_mm256_alignr_epi8^{⚠}  x86 and avx2 Concatenate pairs of 16byte blocks in 
_mm256_and_si256^{⚠}  x86 and avx2 Compute the bitwise AND of 256 bits (representing integer data)
in 
_mm256_andnot_si256^{⚠}  x86 and avx2 Compute the bitwise NOT of 256 bits (representing integer data)
in 
_mm256_avg_epu8^{⚠}  x86 and avx2 Average packed unsigned 8bit integers in 
_mm256_avg_epu16^{⚠}  x86 and avx2 Average packed unsigned 16bit integers in 
_mm256_blend_epi16^{⚠}  x86 and avx2 Blend packed 16bit integers from 
_mm256_blend_epi32^{⚠}  x86 and avx2 Blend packed 32bit integers from 
_mm256_blendv_epi8^{⚠}  x86 and avx2 Blend packed 8bit integers from 
_mm256_broadcastb_epi8^{⚠}  x86 and avx2 Broadcast the low packed 8bit integer from 
_mm256_broadcastd_epi32^{⚠}  x86 and avx2 Broadcast the low packed 32bit integer from 
_mm256_broadcastq_epi64^{⚠}  x86 and avx2 Broadcast the low packed 64bit integer from 
_mm256_broadcastsi128_si256^{⚠}  x86 and avx2 Broadcast 128 bits of integer data from a to all 128bit lanes in the 256bit returned value. 
_mm256_broadcastw_epi16^{⚠}  x86 and avx2 Broadcast the low packed 16bit integer from a to all elements of the 256bit returned value 
_mm256_bslli_epi128^{⚠}  x86 and avx2 Shift 128bit lanes in 
_mm256_bsrli_epi128^{⚠}  x86 and avx2 Shift 128bit lanes in 
_mm256_castpd128_pd256^{⚠}  x86 and avx Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. 
_mm256_castpd256_pd128^{⚠}  x86 and avx Casts vector of type __m256d to type __m128d. 
_mm256_castpd_si256^{⚠}  x86 and avx Casts vector of type __m256d to type __m256i. 
_mm256_castps128_ps256^{⚠}  x86 and avx Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. 
_mm256_castps256_ps128^{⚠}  x86 and avx Casts vector of type __m256 to type __m128. 
_mm256_castps_si256^{⚠}  x86 and avx Casts vector of type __m256 to type __m256i. 
_mm256_castsi256_ps^{⚠}  x86 and avx Casts vector of type __m256i to type __m256. 
_mm256_castsi256_pd^{⚠}  x86 and avx Casts vector of type __m256i to type __m256d. 
_mm256_castsi128_si256^{⚠}  x86 and avx Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. 
_mm256_castsi256_si128^{⚠}  x86 and avx Casts vector of type __m256i to type __m128i. 
_mm256_cmpeq_epi8^{⚠}  x86 and avx2 Compare packed 8bit integers in 
_mm256_cmpeq_epi16^{⚠}  x86 and avx2 Compare packed 16bit integers in 
_mm256_cmpeq_epi32^{⚠}  x86 and avx2 Compare packed 32bit integers in 
_mm256_cmpeq_epi64^{⚠}  x86 and avx2 Compare packed 64bit integers in 
_mm256_cmpgt_epi8^{⚠}  x86 and avx2 Compare packed 8bit integers in 
_mm256_cmpgt_epi16^{⚠}  x86 and avx2 Compare packed 16bit integers in 
_mm256_cmpgt_epi32^{⚠}  x86 and avx2 Compare packed 32bit integers in 
_mm256_cmpgt_epi64^{⚠}  x86 and avx2 Compare packed 64bit integers in 
_mm256_cvtepi32_pd^{⚠}  x86 and avx Convert packed 32bit integers in 
_mm256_cvtepi32_ps^{⚠}  x86 and avx Convert packed 32bit integers in 
_mm256_cvtepi16_epi32^{⚠}  x86 and avx2 Signextend 16bit integers to 32bit integers. 
_mm256_cvtepi16_epi64^{⚠}  x86 and avx2 Signextend 16bit integers to 64bit integers. 
_mm256_cvtepi32_epi64^{⚠}  x86 and avx2 Signextend 32bit integers to 64bit integers. 
_mm256_cvtepi8_epi16^{⚠}  x86 and avx2 Signextend 8bit integers to 16bit integers. 
_mm256_cvtepi8_epi32^{⚠}  x86 and avx2 Signextend 8bit integers to 32bit integers. 
_mm256_cvtepi8_epi64^{⚠}  x86 and avx2 Signextend 8bit integers to 64bit integers. 
_mm256_cvtepu16_epi32^{⚠}  x86 and avx2 Zero extend packed unsigned 16bit integers in 
_mm256_cvtepu16_epi64^{⚠}  x86 and avx2 Zeroextend the lower four unsigned 16bit integers in 
_mm256_cvtepu32_epi64^{⚠}  x86 and avx2 Zeroextend unsigned 32bit integers in 
_mm256_cvtepu8_epi16^{⚠}  x86 and avx2 Zeroextend unsigned 8bit integers in 
_mm256_cvtepu8_epi32^{⚠}  x86 and avx2 Zeroextend the lower eight unsigned 8bit integers in 
_mm256_cvtepu8_epi64^{⚠}  x86 and avx2 Zeroextend the lower four unsigned 8bit integers in 
_mm256_cvtpd_epi32^{⚠}  x86 and avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtps_epi32^{⚠}  x86 and avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_cvtsd_f64^{⚠}  x86 and avx2 Returns the first element of the input vector of 
_mm256_cvtsi256_si32^{⚠}  x86 and avx2 Returns the first element of the input vector of 
_mm256_cvtss_f32^{⚠}  x86 and avx Returns the first element of the input vector of 
_mm256_cvttpd_epi32^{⚠}  x86 and avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvttps_epi32^{⚠}  x86 and avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_extract_epi8^{⚠}  x86 and avx2 Extract an 8bit integer from 
_mm256_extract_epi16^{⚠}  x86 and avx2 Extract a 16bit integer from 
_mm256_extract_epi32^{⚠}  x86 and avx2 Extract a 32bit integer from 
_mm256_extractf128_ps^{⚠}  x86 and avx Extract 128 bits (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_extractf128_pd^{⚠}  x86 and avx Extract 128 bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_extractf128_si256^{⚠}  x86 and avx Extract 128 bits (composed of integer data) from 
_mm256_extracti128_si256^{⚠}  x86 and avx2 Extract 128 bits (of integer data) from 
_mm256_hadd_epi16^{⚠}  x86 and avx2 Horizontally add adjacent pairs of 16bit integers in 
_mm256_hadd_epi32^{⚠}  x86 and avx2 Horizontally add adjacent pairs of 32bit integers in 
_mm256_hadds_epi16^{⚠}  x86 and avx2 Horizontally add adjacent pairs of 16bit integers in 
_mm256_hsub_epi16^{⚠}  x86 and avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_hsub_epi32^{⚠}  x86 and avx2 Horizontally subtract adjacent pairs of 32bit integers in 
_mm256_hsubs_epi16^{⚠}  x86 and avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_i32gather_ps^{⚠}  x86 and avx2 Return values from 
_mm256_i32gather_pd^{⚠}  x86 and avx2 Return values from 
_mm256_i64gather_ps^{⚠}  x86 and avx2 Return values from 
_mm256_i64gather_pd^{⚠}  x86 and avx2 Return values from 
_mm256_i32gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm256_i32gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm256_i64gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm256_i64gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm256_insert_epi8^{⚠}  x86 and avx Copy 
_mm256_insert_epi16^{⚠}  x86 and avx Copy 
_mm256_insert_epi32^{⚠}  x86 and avx Copy 
_mm256_insertf128_ps^{⚠}  x86 and avx Copy 
_mm256_insertf128_pd^{⚠}  x86 and avx Copy 
_mm256_insertf128_si256^{⚠}  x86 and avx Copy 
_mm256_inserti128_si256^{⚠}  x86 and avx2 Copy 
_mm256_lddqu_si256^{⚠}  x86 and avx Load 256bits of integer data from unaligned memory into result.
This intrinsic may perform better than 
_mm256_load_si256^{⚠}  x86 and avx Load 256bits of integer data from memory into result.

_mm256_loadu2_m128^{⚠}  x86 and avx,sse Load two 128bit values (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128d^{⚠}  x86 and avx,sse2 Load two 128bit values (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128i^{⚠}  x86 and avx,sse2 Load two 128bit values (composed of integer data) from memory, and combine
them into a 256bit value.

_mm256_loadu_si256^{⚠}  x86 and avx Load 256bits of integer data from memory into result.

_mm256_madd_epi16^{⚠}  x86 and avx2 Multiply packed signed 16bit integers in 
_mm256_maddubs_epi16^{⚠}  x86 and avx2 Vertically multiply each unsigned 8bit integer from 
_mm256_mask_i32gather_ps^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i32gather_pd^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i64gather_ps^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i64gather_pd^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i32gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i32gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i64gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm256_mask_i64gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm256_maskload_epi32^{⚠}  x86 and avx2 Load packed 32bit integers from memory pointed by 
_mm256_maskload_epi64^{⚠}  x86 and avx2 Load packed 64bit integers from memory pointed by 
_mm256_maskstore_epi32^{⚠}  x86 and avx2 Store packed 32bit integers from 
_mm256_maskstore_epi64^{⚠}  x86 and avx2 Store packed 64bit integers from 
_mm256_max_epi8^{⚠}  x86 and avx2 Compare packed 8bit integers in 
_mm256_max_epi16^{⚠}  x86 and avx2 Compare packed 16bit integers in 
_mm256_max_epi32^{⚠}  x86 and avx2 Compare packed 32bit integers in 
_mm256_max_epu8^{⚠}  x86 and avx2 Compare packed unsigned 8bit integers in 
_mm256_max_epu16^{⚠}  x86 and avx2 Compare packed unsigned 16bit integers in 
_mm256_max_epu32^{⚠}  x86 and avx2 Compare packed unsigned 32bit integers in 
_mm256_min_epi8^{⚠}  x86 and avx2 Compare packed 8bit integers in 
_mm256_min_epi16^{⚠}  x86 and avx2 Compare packed 16bit integers in 
_mm256_min_epi32^{⚠}  x86 and avx2 Compare packed 32bit integers in 
_mm256_min_epu8^{⚠}  x86 and avx2 Compare packed unsigned 8bit integers in 
_mm256_min_epu16^{⚠}  x86 and avx2 Compare packed unsigned 16bit integers in 
_mm256_min_epu32^{⚠}  x86 and avx2 Compare packed unsigned 32bit integers in 
_mm256_movemask_epi8^{⚠}  x86 and avx2 Create mask from the most significant bit of each 8bit element in 
_mm256_mpsadbw_epu8^{⚠}  x86 and avx2 Compute the sum of absolute differences (SADs) of quadruplets of unsigned
8bit integers in 
_mm256_mul_epi32^{⚠}  x86 and avx2 Multiply the low 32bit integers from each packed 64bit element in

_mm256_mul_epu32^{⚠}  x86 and avx2 Multiply the low unsigned 32bit integers from each packed 64bit
element in 
_mm256_mulhi_epi16^{⚠}  x86 and avx2 Multiply the packed 16bit integers in 
_mm256_mulhi_epu16^{⚠}  x86 and avx2 Multiply the packed unsigned 16bit integers in 
_mm256_mulhrs_epi16^{⚠}  x86 and avx2 Multiply packed 16bit integers in 
_mm256_mullo_epi16^{⚠}  x86 and avx2 Multiply the packed 16bit integers in 
_mm256_mullo_epi32^{⚠}  x86 and avx2 Multiply the packed 32bit integers in 
_mm256_or_si256^{⚠}  x86 and avx2 Compute the bitwise OR of 256 bits (representing integer data) in 
_mm256_packs_epi16^{⚠}  x86 and avx2 Convert packed 16bit integers from 
_mm256_packs_epi32^{⚠}  x86 and avx2 Convert packed 32bit integers from 
_mm256_packus_epi16^{⚠}  x86 and avx2 Convert packed 16bit integers from 
_mm256_packus_epi32^{⚠}  x86 and avx2 Convert packed 32bit integers from 
_mm256_permute2f128_ps^{⚠}  x86 and avx Shuffle 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) selected by 
_mm256_permute2f128_pd^{⚠}  x86 and avx Shuffle 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) selected by 
_mm256_permute2f128_si256^{⚠}  x86 and avx Shuffle 258bits (composed of integer data) selected by 
_mm256_permute2x128_si256^{⚠}  x86 and avx2 Shuffle 128bits of integer data selected by 
_mm256_permute4x64_pd^{⚠}  x86 and avx2 Shuffle 64bit floatingpoint elements in 
_mm256_permute4x64_epi64^{⚠}  x86 and avx2 Permutes 64bit integers from 
_mm256_permutevar8x32_ps^{⚠}  x86 and avx2 Shuffle eight 32bit foatingpoint elements in 
_mm256_permutevar8x32_epi32^{⚠}  x86 and avx2 Permutes packed 32bit integers from 
_mm256_sad_epu8^{⚠}  x86 and avx2 Compute the absolute differences of packed unsigned 8bit integers in 
_mm256_set1_pd^{⚠}  x86 and avx Broadcast doubleprecision (64bit) floatingpoint value 
_mm256_set1_ps^{⚠}  x86 and avx Broadcast singleprecision (32bit) floatingpoint value 
_mm256_set1_epi8^{⚠}  x86 and avx Broadcast 8bit integer 
_mm256_set1_epi16^{⚠}  x86 and avx Broadcast 16bit integer 
_mm256_set1_epi32^{⚠}  x86 and avx Broadcast 32bit integer 
_mm256_set1_epi64x^{⚠}  x86 and avx Broadcast 64bit integer 
_mm256_set_epi8^{⚠}  x86 and avx Set packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_set_epi16^{⚠}  x86 and avx Set packed 16bit integers in returned vector with the supplied values. 
_mm256_set_epi32^{⚠}  x86 and avx Set packed 32bit integers in returned vector with the supplied values. 
_mm256_set_epi64x^{⚠}  x86 and avx Set packed 64bit integers in returned vector with the supplied values. 
_mm256_set_m128^{⚠}  x86 and avx Set packed __m256 returned vector with the supplied values. 
_mm256_set_m128d^{⚠}  x86 and avx Set packed __m256d returned vector with the supplied values. 
_mm256_set_m128i^{⚠}  x86 and avx Set packed __m256i returned vector with the supplied values. 
_mm256_setr_epi8^{⚠}  x86 and avx Set packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi16^{⚠}  x86 and avx Set packed 16bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi32^{⚠}  x86 and avx Set packed 32bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi64x^{⚠}  x86 and avx Set packed 64bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_m128^{⚠}  x86 and avx Set packed __m256 returned vector with the supplied values. 
_mm256_setr_m128d^{⚠}  x86 and avx Set packed __m256d returned vector with the supplied values. 
_mm256_setr_m128i^{⚠}  x86 and avx Set packed __m256i returned vector with the supplied values. 
_mm256_setzero_si256^{⚠}  x86 and avx Return vector of type __m256i with all elements set to zero. 
_mm256_shuffle_epi8^{⚠}  x86 and avx2 Shuffle bytes from 
_mm256_shuffle_epi32^{⚠}  x86 and avx2 Shuffle 32bit integers in 128bit lanes of 
_mm256_shufflehi_epi16^{⚠}  x86 and avx2 Shuffle 16bit integers in the high 64 bits of 128bit lanes of 
_mm256_shufflelo_epi16^{⚠}  x86 and avx2 Shuffle 16bit integers in the low 64 bits of 128bit lanes of 
_mm256_sign_epi8^{⚠}  x86 and avx2 Negate packed 8bit integers in 
_mm256_sign_epi16^{⚠}  x86 and avx2 Negate packed 16bit integers in 
_mm256_sign_epi32^{⚠}  x86 and avx2 Negate packed 32bit integers in 
_mm256_sll_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_sll_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_sll_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_slli_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_slli_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_slli_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_slli_si256^{⚠}  x86 and avx2 Shift 128bit lanes in 
_mm256_sllv_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_sllv_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_sra_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_sra_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srai_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_srai_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srav_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srl_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_srl_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srl_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_srli_epi16^{⚠}  x86 and avx2 Shift packed 16bit integers in 
_mm256_srli_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srli_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_srli_si256^{⚠}  x86 and avx2 Shift 128bit lanes in 
_mm256_srlv_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm256_srlv_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm256_store_si256^{⚠}  x86 and avx Store 256bits of integer data from 
_mm256_storeu2_m128^{⚠}  x86 and avx,sse Store the high and low 128bit halves (each composed of 4 packed
singleprecision (32bit) floatingpoint elements) from 
_mm256_storeu2_m128d^{⚠}  x86 and avx,sse2 Store the high and low 128bit halves (each composed of 2 packed
doubleprecision (64bit) floatingpoint elements) from 
_mm256_storeu2_m128i^{⚠}  x86 and avx,sse2 Store the high and low 128bit halves (each composed of integer data) from

_mm256_storeu_si256^{⚠}  x86 and avx Store 256bits of integer data from 
_mm256_stream_si256^{⚠}  x86 and avx Moves integer data from a 256bit integer vector to a 32byte aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon) 
_mm256_sub_epi8^{⚠}  x86 and avx2 Subtract packed 8bit integers in 
_mm256_sub_epi16^{⚠}  x86 and avx2 Subtract packed 16bit integers in 
_mm256_sub_epi32^{⚠}  x86 and avx2 Subtract packed 32bit integers in 
_mm256_sub_epi64^{⚠}  x86 and avx2 Subtract packed 64bit integers in 
_mm256_subs_epi8^{⚠}  x86 and avx2 Subtract packed 8bit integers in 
_mm256_subs_epi16^{⚠}  x86 and avx2 Subtract packed 16bit integers in 
_mm256_subs_epu8^{⚠}  x86 and avx2 Subtract packed unsigned 8bit integers in 
_mm256_subs_epu16^{⚠}  x86 and avx2 Subtract packed unsigned 16bit integers in 
_mm256_testc_si256^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_testnzc_si256^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_testz_si256^{⚠}  x86 and avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_undefined_si256^{⚠}  x86 and avx Return vector of type __m256i with undefined elements. 
_mm256_unpackhi_epi8^{⚠}  x86 and avx2 Unpack and interleave 8bit integers from the high half of each
128bit lane in 
_mm256_unpackhi_epi16^{⚠}  x86 and avx2 Unpack and interleave 16bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi32^{⚠}  x86 and avx2 Unpack and interleave 32bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi64^{⚠}  x86 and avx2 Unpack and interleave 64bit integers from the high half of each
128bit lane of 
_mm256_unpacklo_epi8^{⚠}  x86 and avx2 Unpack and interleave 8bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi16^{⚠}  x86 and avx2 Unpack and interleave 16bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi32^{⚠}  x86 and avx2 Unpack and interleave 32bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi64^{⚠}  x86 and avx2 Unpack and interleave 64bit integers from the low half of each
128bit lane of 
_mm256_xor_si256^{⚠}  x86 and avx2 Compute the bitwise XOR of 256 bits (representing integer data)
in 
_mm256_zextpd128_pd256^{⚠}  x86 and avx,sse2 Constructs a 256bit floatingpoint vector of 
_mm256_zextps128_ps256^{⚠}  x86 and avx,sse Constructs a 256bit floatingpoint vector of 
_mm256_zextsi128_si256^{⚠}  x86 and avx,sse2 Constructs a 256bit integer vector from a 128bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. 
_mm_abs_epi8^{⚠}  x86 and ssse3 Compute the absolute value of packed 8bit signed integers in 
_mm_abs_epi16^{⚠}  x86 and ssse3 Compute the absolute value of each of the packed 16bit signed integers in

_mm_abs_epi32^{⚠}  x86 and ssse3 Compute the absolute value of each of the packed 32bit signed integers in

_mm_add_epi8^{⚠}  x86 and sse2 Add packed 8bit integers in 
_mm_add_epi16^{⚠}  x86 and sse2 Add packed 16bit integers in 
_mm_add_epi32^{⚠}  x86 and sse2 Add packed 32bit integers in 
_mm_add_epi64^{⚠}  x86 and sse2 Add packed 64bit integers in 
_mm_add_pd^{⚠}  x86 and sse2 Add packed doubleprecision (64bit) floatingpoint elements in 
_mm_add_ps^{⚠}  x86 and sse Adds __m128 vectors. 
_mm_add_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_add_ss^{⚠}  x86 and sse Adds the first component of 
_mm_adds_epi8^{⚠}  x86 and sse2 Add packed 8bit integers in 
_mm_adds_epi16^{⚠}  x86 and sse2 Add packed 16bit integers in 
_mm_adds_epu8^{⚠}  x86 and sse2 Add packed unsigned 8bit integers in 
_mm_adds_epu16^{⚠}  x86 and sse2 Add packed unsigned 16bit integers in 
_mm_addsub_pd^{⚠}  x86 and sse3 Alternatively add and subtract packed doubleprecision (64bit)
floatingpoint elements in 
_mm_addsub_ps^{⚠}  x86 and sse3 Alternatively add and subtract packed singleprecision (32bit)
floatingpoint elements in 
_mm_aesdec_si128^{⚠}  x86 and aes Perform one round of an AES decryption flow on data (state) in 
_mm_aesdeclast_si128^{⚠}  x86 and aes Perform the last round of an AES decryption flow on data (state) in 
_mm_aesenc_si128^{⚠}  x86 and aes Perform one round of an AES encryption flow on data (state) in 
_mm_aesenclast_si128^{⚠}  x86 and aes Perform the last round of an AES encryption flow on data (state) in 
_mm_aesimc_si128^{⚠}  x86 and aes Perform the 
_mm_aeskeygenassist_si128^{⚠}  x86 and aes Assist in expanding the AES cipher key. 
_mm_alignr_epi8^{⚠}  x86 and ssse3 Concatenate 16byte blocks in 
_mm_and_pd^{⚠}  x86 and sse2 Compute the bitwise AND of packed doubleprecision (64bit) floatingpoint
elements in 
_mm_and_ps^{⚠}  x86 and sse Bitwise AND of packed singleprecision (32bit) floatingpoint elements. 
_mm_and_si128^{⚠}  x86 and sse2 Compute the bitwise AND of 128 bits (representing integer data) in 
_mm_andnot_pd^{⚠}  x86 and sse2 Compute the bitwise NOT of 
_mm_andnot_ps^{⚠}  x86 and sse Bitwise ANDNOT of packed singleprecision (32bit) floatingpoint elements. 
_mm_andnot_si128^{⚠}  x86 and sse2 Compute the bitwise NOT of 128 bits (representing integer data) in 
_mm_avg_epu8^{⚠}  x86 and sse2 Average packed unsigned 8bit integers in 
_mm_avg_epu16^{⚠}  x86 and sse2 Average packed unsigned 16bit integers in 
_mm_blend_epi16^{⚠}  x86 and sse4.1 Blend packed 16bit integers from 
_mm_blend_epi32^{⚠}  x86 and avx2 Blend packed 32bit integers from 
_mm_blend_pd^{⚠}  x86 and sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blend_ps^{⚠}  x86 and sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_blendv_epi8^{⚠}  x86 and sse4.1 Blend packed 8bit integers from 
_mm_blendv_pd^{⚠}  x86 and sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blendv_ps^{⚠}  x86 and sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_broadcast_ss^{⚠}  x86 and avx Broadcast a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm_broadcastb_epi8^{⚠}  x86 and avx2 Broadcast the low packed 8bit integer from 
_mm_broadcastd_epi32^{⚠}  x86 and avx2 Broadcast the low packed 32bit integer from 
_mm_broadcastq_epi64^{⚠}  x86 and avx2 Broadcast the low packed 64bit integer from 
_mm_broadcastsd_pd^{⚠}  x86 and avx2 Broadcast the low doubleprecision (64bit) floatingpoint element
from 
_mm_broadcastss_ps^{⚠}  x86 and avx2 Broadcast the low singleprecision (32bit) floatingpoint element
from 
_mm_broadcastw_epi16^{⚠}  x86 and avx2 Broadcast the low packed 16bit integer from a to all elements of the 128bit returned value 
_mm_bslli_si128^{⚠}  x86 and sse2 Shift 
_mm_bsrli_si128^{⚠}  x86 and sse2 Shift 
_mm_castpd_ps^{⚠}  x86 and sse2 Casts a 128bit floatingpoint vector of 
_mm_castpd_si128^{⚠}  x86 and sse2 Casts a 128bit floatingpoint vector of 
_mm_castps_pd^{⚠}  x86 and sse2 Casts a 128bit floatingpoint vector of 
_mm_castps_si128^{⚠}  x86 and sse2 Casts a 128bit floatingpoint vector of 
_mm_castsi128_pd^{⚠}  x86 and sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector
of 
_mm_castsi128_ps^{⚠}  x86 and sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector
of 
_mm_ceil_pd^{⚠}  x86 and sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_ceil_ps^{⚠}  x86 and sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_ceil_sd^{⚠}  x86 and sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_ceil_ss^{⚠}  x86 and sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_clflush^{⚠}  x86 and sse2 Invalidate and flush the cache line that contains 
_mm_clmulepi64_si128^{⚠}  x86 and pclmulqdq Perform a carryless multiplication of two 64bit polynomials over the finite field GF(2^k). 
_mm_cmp_pd^{⚠}  x86 and avx,sse2 Compare packed doubleprecision (64bit) floatingpoint
elements in 
_mm_cmp_ps^{⚠}  x86 and avx,sse Compare packed singleprecision (32bit) floatingpoint
elements in 
_mm_cmp_sd^{⚠}  x86 and avx,sse2 Compare the lower doubleprecision (64bit) floatingpoint element in

_mm_cmp_ss^{⚠}  x86 and avx,sse Compare the lower singleprecision (32bit) floatingpoint element in

_mm_cmpeq_epi8^{⚠}  x86 and sse2 Compare packed 8bit integers in 
_mm_cmpeq_epi16^{⚠}  x86 and sse2 Compare packed 16bit integers in 
_mm_cmpeq_epi32^{⚠}  x86 and sse2 Compare packed 32bit integers in 
_mm_cmpeq_epi64^{⚠}  x86 and sse4.1 Compare packed 64bit integers in 
_mm_cmpeq_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpeq_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpeq_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpeq_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpestra^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpestrc^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpestri^{⚠}  x86 and sse4.2 Compare packed strings 
_mm_cmpestrm^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpestro^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpestrs^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpestrz^{⚠}  x86 and sse4.2 Compare packed strings in 
_mm_cmpge_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpge_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpge_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpge_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpgt_epi8^{⚠}  x86 and sse2 Compare packed 8bit integers in 
_mm_cmpgt_epi16^{⚠}  x86 and sse2 Compare packed 16bit integers in 
_mm_cmpgt_epi32^{⚠}  x86 and sse2 Compare packed 32bit integers in 
_mm_cmpgt_epi64^{⚠}  x86 and sse4.2 Compare packed 64bit integers in 
_mm_cmpgt_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpgt_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpgt_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpgt_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpistra^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrc^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistri^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrm^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistro^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrs^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrz^{⚠}  x86 and sse4.2 Compare packed strings with implicit lengths in 
_mm_cmple_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmple_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmple_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmple_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmplt_epi8^{⚠}  x86 and sse2 Compare packed 8bit integers in 
_mm_cmplt_epi16^{⚠}  x86 and sse2 Compare packed 16bit integers in 
_mm_cmplt_epi32^{⚠}  x86 and sse2 Compare packed 32bit integers in 
_mm_cmplt_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmplt_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmplt_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmplt_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpneq_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpneq_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpneq_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpneq_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpnge_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpnge_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpnge_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpnge_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpngt_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpngt_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpngt_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpngt_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpnle_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpnle_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpnle_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpnle_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpnlt_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpnlt_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpnlt_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpnlt_ss^{⚠}  x86 and sse Compare the lowest 
_mm_cmpord_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpord_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpord_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpord_ss^{⚠}  x86 and sse Check if the lowest 
_mm_cmpunord_pd^{⚠}  x86 and sse2 Compare corresponding elements in 
_mm_cmpunord_ps^{⚠}  x86 and sse Compare each of the four floats in 
_mm_cmpunord_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_cmpunord_ss^{⚠}  x86 and sse Check if the lowest 
_mm_comieq_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comieq_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_comige_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comige_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_comigt_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comigt_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_comile_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comile_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_comilt_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comilt_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_comineq_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_comineq_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_crc32_u8^{⚠}  x86 and sse4.2 Starting with the initial value in 
_mm_crc32_u16^{⚠}  x86 and sse4.2 Starting with the initial value in 
_mm_crc32_u32^{⚠}  x86 and sse4.2 Starting with the initial value in 
_mm_cvt_si2ss^{⚠}  x86 and sse Alias for 
_mm_cvt_ss2si^{⚠}  x86 and sse Alias for 
_mm_cvtepi32_pd^{⚠}  x86 and sse2 Convert the lower two packed 32bit integers in 
_mm_cvtepi32_ps^{⚠}  x86 and sse2 Convert packed 32bit integers in 
_mm_cvtepi16_epi32^{⚠}  x86 and sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi16_epi64^{⚠}  x86 and sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi32_epi64^{⚠}  x86 and sse4.1 Sign extend packed 32bit integers in 
_mm_cvtepi8_epi16^{⚠}  x86 and sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi32^{⚠}  x86 and sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi64^{⚠}  x86 and sse4.1 Sign extend packed 8bit integers in the low 8 bytes of 
_mm_cvtepu16_epi32^{⚠}  x86 and sse4.1 Zero extend packed unsigned 16bit integers in 
_mm_cvtepu16_epi64^{⚠}  x86 and sse4.1 Zero extend packed unsigned 16bit integers in 
_mm_cvtepu32_epi64^{⚠}  x86 and sse4.1 Zero extend packed unsigned 32bit integers in 
_mm_cvtepu8_epi16^{⚠}  x86 and sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi32^{⚠}  x86 and sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi64^{⚠}  x86 and sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtpd_epi32^{⚠}  x86 and sse2 Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvtpd_ps^{⚠}  x86 and sse2 Convert packed doubleprecision (64bit) floatingpoint elements in "a" to packed singleprecision (32bit) floatingpoint elements 
_mm_cvtps_epi32^{⚠}  x86 and sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pd^{⚠}  x86 and sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtsd_f64^{⚠}  x86 and sse2 Return the lower doubleprecision (64bit) floatingpoint element of "a". 
_mm_cvtsd_si32^{⚠}  x86 and sse2 Convert the lower doubleprecision (64bit) floatingpoint element in a to a 32bit integer. 
_mm_cvtsd_ss^{⚠}  x86 and sse2 Convert the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvtsi32_ss^{⚠}  x86 and sse Convert a 32 bit integer to a 32 bit float. The result vector is the input
vector 
_mm_cvtsi32_sd^{⚠}  x86 and sse2 Return 
_mm_cvtsi128_si32^{⚠}  x86 and sse2 Return the lowest element of 
_mm_cvtsi32_si128^{⚠}  x86 and sse2 Return a vector whose lowest element is 
_mm_cvtss_f32^{⚠}  x86 and sse Extract the lowest 32 bit float from the input vector. 
_mm_cvtss_sd^{⚠}  x86 and sse2 Convert the lower singleprecision (32bit) floatingpoint element in 
_mm_cvtss_si32^{⚠}  x86 and sse Convert the lowest 32 bit float in the input vector to a 32 bit integer. 
_mm_cvtt_ss2si^{⚠}  x86 and sse Alias for 
_mm_cvttpd_epi32^{⚠}  x86 and sse2 Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvttps_epi32^{⚠}  x86 and sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvttsd_si32^{⚠}  x86 and sse2 Convert the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvttss_si32^{⚠}  x86 and sse Convert the lowest 32 bit float in the input vector to a 32 bit integer with truncation. 
_mm_div_pd^{⚠}  x86 and sse2 Divide packed doubleprecision (64bit) floatingpoint elements in 
_mm_div_ps^{⚠}  x86 and sse Divides __m128 vectors. 
_mm_div_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_div_ss^{⚠}  x86 and sse Divides the first component of 
_mm_dp_pd^{⚠}  x86 and sse4.1 Returns the dot product of two __m128d vectors. 
_mm_dp_ps^{⚠}  x86 and sse4.1 Returns the dot product of two __m128 vectors. 
_mm_extract_epi8^{⚠}  x86 and sse4.1 Extract an 8bit integer from 
_mm_extract_epi16^{⚠}  x86 and sse2 Return the 
_mm_extract_epi32^{⚠}  x86 and sse4.1 Extract an 32bit integer from 
_mm_extract_ps^{⚠}  x86 and sse4.1 Extract a singleprecision (32bit) floatingpoint element from 
_mm_extract_si64^{⚠}  x86 and sse4a Extracts the bit range specified by 
_mm_floor_pd^{⚠}  x86 and sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_floor_ps^{⚠}  x86 and sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_floor_sd^{⚠}  x86 and sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_floor_ss^{⚠}  x86 and sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_fmadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fmadd_sd^{⚠}  x86 and fma Multiply the lower doubleprecision (64bit) floatingpoint elements in

_mm_fmadd_ss^{⚠}  x86 and fma Multiply the lower singleprecision (32bit) floatingpoint elements in

_mm_fmaddsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmaddsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fmsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fmsub_sd^{⚠}  x86 and fma Multiply the lower doubleprecision (64bit) floatingpoint elements in

_mm_fmsub_ss^{⚠}  x86 and fma Multiply the lower singleprecision (32bit) floatingpoint elements in

_mm_fmsubadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmsubadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmadd_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fnmadd_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmadd_sd^{⚠}  x86 and fma Multiply the lower doubleprecision (64bit) floatingpoint elements in

_mm_fnmadd_ss^{⚠}  x86 and fma Multiply the lower singleprecision (32bit) floatingpoint elements in

_mm_fnmsub_pd^{⚠}  x86 and fma Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_fnmsub_ps^{⚠}  x86 and fma Multiply packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmsub_sd^{⚠}  x86 and fma Multiply the lower doubleprecision (64bit) floatingpoint elements in

_mm_fnmsub_ss^{⚠}  x86 and fma Multiply the lower singleprecision (32bit) floatingpoint elements in

_mm_getcsr^{⚠}  x86 and sse Get the unsigned 32bit value of the MXCSR control and status register. 
_mm_hadd_epi16^{⚠}  x86 and ssse3 Horizontally add the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hadd_epi32^{⚠}  x86 and ssse3 Horizontally add the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hadd_pd^{⚠}  x86 and sse3 Horizontally add adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hadd_ps^{⚠}  x86 and sse3 Horizontally add adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hadds_epi16^{⚠}  x86 and ssse3 Horizontally add the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hsub_epi16^{⚠}  x86 and ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_hsub_epi32^{⚠}  x86 and ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_hsub_pd^{⚠}  x86 and sse3 Horizontally subtract adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hsub_ps^{⚠}  x86 and sse3 Horizontally add adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hsubs_epi16^{⚠}  x86 and ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_i32gather_ps^{⚠}  x86 and avx2 Return values from 
_mm_i32gather_pd^{⚠}  x86 and avx2 Return values from 
_mm_i64gather_ps^{⚠}  x86 and avx2 Return values from 
_mm_i64gather_pd^{⚠}  x86 and avx2 Return values from 
_mm_i32gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm_i32gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm_i64gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm_i64gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm_insert_epi8^{⚠}  x86 and sse4.1 Return a copy of 
_mm_insert_epi16^{⚠}  x86 and sse2 Return a new vector where the 
_mm_insert_epi32^{⚠}  x86 and sse4.1 Return a copy of 
_mm_insert_ps^{⚠}  x86 and sse4.1 Select a single value in 
_mm_insert_si64^{⚠}  x86 and sse4a Inserts the 
_mm_lddqu_si128^{⚠}  x86 and sse3 Load 128bits of integer data from unaligned memory.
This intrinsic may perform better than 
_mm_lfence^{⚠}  x86 and sse2 Perform a serializing operation on all loadfrommemory instructions that were issued prior to this instruction. 
_mm_load1_ps^{⚠}  x86 and sse Construct a 
_mm_load1_pd^{⚠}  x86 and sse2 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load_pd^{⚠}  x86 and sse2 Load 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_load_pd1^{⚠}  x86 and sse2 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load_ps^{⚠}  x86 and sse Load four 
_mm_load_ps1^{⚠}  x86 and sse Alias for 
_mm_load_sd^{⚠}  x86 and sse2 Loads a 64bit doubleprecision value to the low element of a 128bit integer vector and clears the upper element. 
_mm_load_si128^{⚠}  x86 and sse2 Load 128bits of integer data from memory into a new vector. 
_mm_load_ss^{⚠}  x86 and sse Construct a 
_mm_loaddup_pd^{⚠}  x86 and sse3 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of return vector. 
_mm_loadh_pd^{⚠}  x86 and sse2 Loads a doubleprecision value into the highorder bits of a 128bit
vector of 
_mm_loadl_epi64^{⚠}  x86 and sse2 Load 64bit integer from memory into first element of returned vector. 
_mm_loadl_pd^{⚠}  x86 and sse2 Loads a doubleprecision value into the loworder bits of a 128bit
vector of 
_mm_loadr_pd^{⚠}  x86 and sse2 Load 2 doubleprecision (64bit) floatingpoint elements from memory into
the returned vector in reverse order. 
_mm_loadr_ps^{⚠}  x86 and sse Load four 
_mm_loadu_pd^{⚠}  x86 and sse2 Load 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_loadu_ps^{⚠}  x86 and sse Load four 
_mm_loadu_si128^{⚠}  x86 and sse2 Load 128bits of integer data from memory into a new vector. 
_mm_madd_epi16^{⚠}  x86 and sse2 Multiply and then horizontally add signed 16 bit integers in 
_mm_maddubs_epi16^{⚠}  x86 and ssse3 Multiply corresponding pairs of packed 8bit unsigned integer values contained in the first source operand and packed 8bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16bit sums to the corresponding bits in the destination. 
_mm_mask_i32gather_ps^{⚠}  x86 and avx2 Return values from 
_mm_mask_i32gather_pd^{⚠}  x86 and avx2 Return values from 
_mm_mask_i64gather_ps^{⚠}  x86 and avx2 Return values from 
_mm_mask_i64gather_pd^{⚠}  x86 and avx2 Return values from 
_mm_mask_i32gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm_mask_i32gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm_mask_i64gather_epi32^{⚠}  x86 and avx2 Return values from 
_mm_mask_i64gather_epi64^{⚠}  x86 and avx2 Return values from 
_mm_maskload_epi32^{⚠}  x86 and avx2 Load packed 32bit integers from memory pointed by 
_mm_maskload_epi64^{⚠}  x86 and avx2 Load packed 64bit integers from memory pointed by 
_mm_maskload_pd^{⚠}  x86 and avx Load packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm_maskload_ps^{⚠}  x86 and avx Load packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm_maskmoveu_si128^{⚠}  x86 and sse2 Conditionally store 8bit integer elements from 
_mm_maskstore_epi32^{⚠}  x86 and avx2 Store packed 32bit integers from 
_mm_maskstore_epi64^{⚠}  x86 and avx2 Store packed 64bit integers from 
_mm_maskstore_pd^{⚠}  x86 and avx Store packed doubleprecision (64bit) floatingpoint elements from 
_mm_maskstore_ps^{⚠}  x86 and avx Store packed singleprecision (32bit) floatingpoint elements from 
_mm_max_epi8^{⚠}  x86 and sse4.1 Compare packed 8bit integers in 
_mm_max_epi16^{⚠}  x86 and sse2 Compare packed 16bit integers in 
_mm_max_epi32^{⚠}  x86 and sse4.1 Compare packed 32bit integers in 
_mm_max_epu8^{⚠}  x86 and sse2 Compare packed unsigned 8bit integers in 
_mm_max_epu16^{⚠}  x86 and sse4.1 Compare packed unsigned 16bit integers in 
_mm_max_epu32^{⚠}  x86 and sse4.1 Compare packed unsigned 32bit integers in 
_mm_max_pd^{⚠}  x86 and sse2 Return a new vector with the maximum values from corresponding elements in

_mm_max_ps^{⚠}  x86 and sse Compare packed singleprecision (32bit) floatingpoint elements in 
_mm_max_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_max_ss^{⚠}  x86 and sse Compare the first singleprecision (32bit) floatingpoint element of 
_mm_mfence^{⚠}  x86 and sse2 Perform a serializing operation on all loadfrommemory and storetomemory instructions that were issued prior to this instruction. 
_mm_min_epi8^{⚠}  x86 and sse4.1 Compare packed 8bit integers in 
_mm_min_epi16^{⚠}  x86 and sse2 Compare packed 16bit integers in 
_mm_min_epi32^{⚠}  x86 and sse4.1 Compare packed 32bit integers in 
_mm_min_epu8^{⚠}  x86 and sse2 Compare packed unsigned 8bit integers in 
_mm_min_epu16^{⚠}  x86 and sse4.1 Compare packed unsigned 16bit integers in 
_mm_min_epu32^{⚠}  x86 and sse4.1 Compare packed unsigned 32bit integers in 
_mm_min_pd^{⚠}  x86 and sse2 Return a new vector with the minimum values from corresponding elements in

_mm_min_ps^{⚠}  x86 and sse Compare packed singleprecision (32bit) floatingpoint elements in 
_mm_min_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_min_ss^{⚠}  x86 and sse Compare the first singleprecision (32bit) floatingpoint element of 
_mm_minpos_epu16^{⚠}  x86 and sse4.1 Finds the minimum unsigned 16bit element in the 128bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero. 
_mm_move_epi64^{⚠}  x86 and sse2 Return a vector where the low element is extracted from 
_mm_move_sd^{⚠}  x86 and sse2 Constructs a 128bit floatingpoint vector of 
_mm_move_ss^{⚠}  x86 and sse Return a 
_mm_movedup_pd^{⚠}  x86 and sse3 Duplicate the low doubleprecision (64bit) floatingpoint element
from 
_mm_movehdup_ps^{⚠}  x86 and sse3 Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movehl_ps^{⚠}  x86 and sse Combine higher half of 
_mm_moveldup_ps^{⚠}  x86 and sse3 Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movelh_ps^{⚠}  x86 and sse Combine lower half of 
_mm_movemask_epi8^{⚠}  x86 and sse2 Return a mask of the most significant bit of each element in 
_mm_movemask_pd^{⚠}  x86 and sse2 Return a mask of the most significant bit of each element in 
_mm_movemask_ps^{⚠}  x86 and sse Return a mask of the most significant bit of each element in 
_mm_mpsadbw_epu8^{⚠}  x86 and sse4.1 Subtracts 8bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand. 
_mm_mul_epi32^{⚠}  x86 and sse4.1 Multiply the low 32bit integers from each packed 64bit
element in 
_mm_mul_epu32^{⚠}  x86 and sse2 Multiply the low unsigned 32bit integers from each packed 64bit element
in 
_mm_mul_pd^{⚠}  x86 and sse2 Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_mul_ps^{⚠}  x86 and sse Multiplies __m128 vectors. 
_mm_mul_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_mul_ss^{⚠}  x86 and sse Multiplies the first component of 
_mm_mulhi_epi16^{⚠}  x86 and sse2 Multiply the packed 16bit integers in 
_mm_mulhi_epu16^{⚠}  x86 and sse2 Multiply the packed unsigned 16bit integers in 
_mm_mulhrs_epi16^{⚠}  x86 and ssse3 Multiply packed 16bit signed integer values, truncate the 32bit
product to the 18 most significant bits by rightshifting, round the
truncated value by adding 1, and write bits 
_mm_mullo_epi16^{⚠}  x86 and sse2 Multiply the packed 16bit integers in 
_mm_mullo_epi32^{⚠}  x86 and sse4.1 Multiply the packed 32bit integers in 
_mm_or_pd^{⚠}  x86 and sse2 Compute the bitwise OR of 
_mm_or_ps^{⚠}  x86 and sse Bitwise OR of packed singleprecision (32bit) floatingpoint elements. 
_mm_or_si128^{⚠}  x86 and sse2 Compute the bitwise OR of 128 bits (representing integer data) in 
_mm_packs_epi16^{⚠}  x86 and sse2 Convert packed 16bit integers from 
_mm_packs_epi32^{⚠}  x86 and sse2 Convert packed 32bit integers from 
_mm_packus_epi16^{⚠}  x86 and sse2 Convert packed 16bit integers from 
_mm_packus_epi32^{⚠}  x86 and sse4.1 Convert packed 32bit integers from 
_mm_pause^{⚠}  x86 and sse2 Provide a hint to the processor that the code sequence is a spinwait loop. 
_mm_permute_pd^{⚠}  x86 and avx,sse2 Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm_permute_ps^{⚠}  x86 and avx,sse Shuffle singleprecision (32bit) floatingpoint elements in 
_mm_permutevar_pd^{⚠}  x86 and avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm_permutevar_ps^{⚠}  x86 and avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm_prefetch^{⚠}  x86 and sse Fetch the cache line that contains address 
_mm_rcp_ps^{⚠}  x86 and sse Return the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm_rcp_ss^{⚠}  x86 and sse Return the approximate reciprocal of the first singleprecision
(32bit) floatingpoint element in 
_mm_round_pd^{⚠}  x86 and sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_round_ps^{⚠}  x86 and sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_round_sd^{⚠}  x86 and sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_round_ss^{⚠}  x86 and sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_rsqrt_ps^{⚠}  x86 and sse Return the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm_rsqrt_ss^{⚠}  x86 and sse Return the approximate reciprocal square root of the fist singleprecision
(32bit) floatingpoint elements in 
_mm_sad_epu8^{⚠}  x86 and sse2 Sum the absolute differences of packed unsigned 8bit integers. 
_mm_set1_ps^{⚠}  x86 and sse Construct a 
_mm_set1_pd^{⚠}  x86 and sse2 Broadcast doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set1_epi8^{⚠}  x86 and sse2 Broadcast 8bit integer 
_mm_set1_epi16^{⚠}  x86 and sse2 Broadcast 16bit integer 
_mm_set1_epi32^{⚠}  x86 and sse2 Broadcast 32bit integer 
_mm_set1_epi64x^{⚠}  x86 and sse2 Broadcast 64bit integer 
_mm_set_epi8^{⚠}  x86 and sse2 Set packed 8bit integers with the supplied values. 
_mm_set_epi16^{⚠}  x86 and sse2 Set packed 16bit integers with the supplied values. 
_mm_set_epi32^{⚠}  x86 and sse2 Set packed 32bit integers with the supplied values. 
_mm_set_epi64x^{⚠}  x86 and sse2 Set packed 64bit integers with the supplied values, from highest to lowest. 
_mm_set_pd^{⚠}  x86 and sse2 Set packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values. 
_mm_set_pd1^{⚠}  x86 and sse2 Broadcast doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set_ps^{⚠}  x86 and sse Construct a 
_mm_set_ps1^{⚠}  x86 and sse Alias for 
_mm_set_sd^{⚠}  x86 and sse2 Copy doubleprecision (64bit) floatingpoint element 
_mm_set_ss^{⚠}  x86 and sse Construct a 
_mm_setcsr^{⚠}  x86 and sse Set the MXCSR register with the 32bit unsigned integer value. 
_mm_setr_epi8^{⚠}  x86 and sse2 Set packed 8bit integers with the supplied values in reverse order. 
_mm_setr_epi16^{⚠}  x86 and sse2 Set packed 16bit integers with the supplied values in reverse order. 
_mm_setr_epi32^{⚠}  x86 and sse2 Set packed 32bit integers with the supplied values in reverse order. 
_mm_setr_pd^{⚠}  x86 and sse2 Set packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values in reverse order. 
_mm_setr_ps^{⚠}  x86 and sse Construct a 
_mm_setzero_pd^{⚠}  x86 and sse2 Returns packed doubleprecision (64bit) floatingpoint elements with all zeros. 
_mm_setzero_ps^{⚠}  x86 and sse Construct a 
_mm_setzero_si128^{⚠}  x86 and sse2 Returns a vector with all elements set to zero. 
_mm_sfence^{⚠}  x86 and sse Perform a serializing operation on all storetomemory instructions that were issued prior to this instruction. 
_mm_sha1msg1_epu32^{⚠}  x86 and sha Perform an intermediate calculation for the next four SHA1 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha1msg2_epu32^{⚠}  x86 and sha Perform the final calculation for the next four SHA1 message values
(unsigned 32bit integers) using the intermediate result in 
_mm_sha1nexte_epu32^{⚠}  x86 and sha Calculate SHA1 state variable E after four rounds of operation from the
current SHA1 state variable 
_mm_sha1rnds4_epu32^{⚠}  x86 and sha Perform four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D)
from 
_mm_sha256msg1_epu32^{⚠}  x86 and sha Perform an intermediate calculation for the next four SHA256 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha256msg2_epu32^{⚠}  x86 and sha Perform the final calculation for the next four SHA256 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha256rnds2_epu32^{⚠}  x86 and sha Perform 2 rounds of SHA256 operation using an initial SHA256 state
(C,D,G,H) from 
_mm_shuffle_epi8^{⚠}  x86 and ssse3 Shuffle bytes from 
_mm_shuffle_epi32^{⚠}  x86 and sse2 Shuffle 32bit integers in 
_mm_shuffle_pd^{⚠}  x86 and sse2 Constructs a 128bit floatingpoint vector of 
_mm_shuffle_ps^{⚠}  x86 and sse Shuffle packed singleprecision (32bit) floatingpoint elements in 
_mm_shufflehi_epi16^{⚠}  x86 and sse2 Shuffle 16bit integers in the high 64 bits of 
_mm_shufflelo_epi16^{⚠}  x86 and sse2 Shuffle 16bit integers in the low 64 bits of 
_mm_sign_epi8^{⚠}  x86 and ssse3 Negate packed 8bit integers in 
_mm_sign_epi16^{⚠}  x86 and ssse3 Negate packed 16bit integers in 
_mm_sign_epi32^{⚠}  x86 and ssse3 Negate packed 32bit integers in 
_mm_sll_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_sll_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_sll_epi64^{⚠}  x86 and sse2 Shift packed 64bit integers in 
_mm_slli_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_slli_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_slli_epi64^{⚠}  x86 and sse2 Shift packed 64bit integers in 
_mm_slli_si128^{⚠}  x86 and sse2 Shift 
_mm_sllv_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm_sllv_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm_sqrt_pd^{⚠}  x86 and sse2 Return a new vector with the square root of each of the values in 
_mm_sqrt_ps^{⚠}  x86 and sse Return the square root of packed singleprecision (32bit) floatingpoint
elements in 
_mm_sqrt_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_sqrt_ss^{⚠}  x86 and sse Return the square root of the first singleprecision (32bit)
floatingpoint element in 
_mm_sra_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_sra_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_srai_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_srai_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_srav_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm_srl_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_srl_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_srl_epi64^{⚠}  x86 and sse2 Shift packed 64bit integers in 
_mm_srli_epi16^{⚠}  x86 and sse2 Shift packed 16bit integers in 
_mm_srli_epi32^{⚠}  x86 and sse2 Shift packed 32bit integers in 
_mm_srli_epi64^{⚠}  x86 and sse2 Shift packed 64bit integers in 
_mm_srli_si128^{⚠}  x86 and sse2 Shift 
_mm_srlv_epi32^{⚠}  x86 and avx2 Shift packed 32bit integers in 
_mm_srlv_epi64^{⚠}  x86 and avx2 Shift packed 64bit integers in 
_mm_store1_ps^{⚠}  x86 and sse Store the lowest 32 bit float of 
_mm_store1_pd^{⚠}  x86 and sse2 Store the lower doubleprecision (64bit) floatingpoint element from 
_mm_store_pd^{⚠}  x86 and sse2 Store 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_store_pd1^{⚠}  x86 and sse2 Store the lower doubleprecision (64bit) floatingpoint element from 
_mm_store_ps^{⚠}  x86 and sse Store four 32bit floats into aligned memory. 
_mm_store_ps1^{⚠}  x86 and sse Alias for 
_mm_store_sd^{⚠}  x86 and sse2 Stores the lower 64 bits of a 128bit vector of 
_mm_store_si128^{⚠}  x86 and sse2 Store 128bits of integer data from 
_mm_store_ss^{⚠}  x86 and sse Store the lowest 32 bit float of 
_mm_storeh_pd^{⚠}  x86 and sse2 Stores the upper 64 bits of a 128bit vector of 
_mm_storel_epi64^{⚠}  x86 and sse2 Store the lower 64bit integer 
_mm_storel_pd^{⚠}  x86 and sse2 Stores the lower 64 bits of a 128bit vector of 
_mm_storer_pd^{⚠}  x86 and sse2 Store 2 doubleprecision (64bit) floatingpoint elements from 
_mm_storer_ps^{⚠}  x86 and sse Store four 32bit floats into aligned memory in reverse order. 
_mm_storeu_pd^{⚠}  x86 and sse2 Store 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_storeu_ps^{⚠}  x86 and sse Store four 32bit floats into memory. There are no restrictions on memory
alignment. For aligned memory 
_mm_storeu_si128^{⚠}  x86 and sse2 Store 128bits of integer data from 
_mm_stream_pd^{⚠}  x86 and sse2 Stores a 128bit floating point vector of 
_mm_stream_ps^{⚠}  x86 and sse Stores 
_mm_stream_sd^{⚠}  x86 and sse4a Nontemporal store of 
_mm_stream_si32^{⚠}  x86 and sse2 Stores a 32bit integer value in the specified memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_si128^{⚠}  x86 and sse2 Stores a 128bit integer vector to a 128bit aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_ss^{⚠}  x86 and sse4a Nontemporal store of 
_mm_sub_epi8^{⚠}  x86 and sse2 Subtract packed 8bit integers in 
_mm_sub_epi16^{⚠}  x86 and sse2 Subtract packed 16bit integers in 
_mm_sub_epi32^{⚠}  x86 and sse2 Subtract packed 32bit integers in 
_mm_sub_epi64^{⚠}  x86 and sse2 Subtract packed 64bit integers in 
_mm_sub_pd^{⚠}  x86 and sse2 Subtract packed doubleprecision (64bit) floatingpoint elements in 
_mm_sub_ps^{⚠}  x86 and sse Subtracts __m128 vectors. 
_mm_sub_sd^{⚠}  x86 and sse2 Return a new vector with the low element of 
_mm_sub_ss^{⚠}  x86 and sse Subtracts the first component of 
_mm_subs_epi8^{⚠}  x86 and sse2 Subtract packed 8bit integers in 
_mm_subs_epi16^{⚠}  x86 and sse2 Subtract packed 16bit integers in 
_mm_subs_epu8^{⚠}  x86 and sse2 Subtract packed unsigned 8bit integers in 
_mm_subs_epu16^{⚠}  x86 and sse2 Subtract packed unsigned 16bit integers in 
_mm_test_all_ones^{⚠}  x86 and sse4.1 Tests whether the specified bits in 
_mm_test_all_zeros^{⚠}  x86 and sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_test_mix_ones_zeros^{⚠}  x86 and sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testc_pd^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testc_ps^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testc_si128^{⚠}  x86 and sse4.1 Tests whether the specified bits in a 128bit integer vector are all ones. 
_mm_testnzc_pd^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testnzc_ps^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testnzc_si128^{⚠}  x86 and sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testz_pd^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testz_ps^{⚠}  x86 and avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testz_si128^{⚠}  x86 and sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_tzcnt_32^{⚠}  x86 and bmi1 Counts the number of trailing least significant zero bits. 
_mm_ucomieq_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomieq_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_ucomige_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomige_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_ucomigt_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomigt_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_ucomile_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomile_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_ucomilt_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomilt_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_ucomineq_sd^{⚠}  x86 and sse2 Compare the lower element of 
_mm_ucomineq_ss^{⚠}  x86 and sse Compare two 32bit floats from the loworder bits of 
_mm_undefined_pd^{⚠}  x86 and sse2 Return vector of type __m128d with undefined elements. 
_mm_undefined_ps^{⚠}  x86 and sse Return vector of type __m128 with undefined elements. 
_mm_undefined_si128^{⚠}  x86 and sse2 Return vector of type __m128i with undefined elements. 
_mm_unpackhi_epi8^{⚠}  x86 and sse2 Unpack and interleave 8bit integers from the high half of 
_mm_unpackhi_epi16^{⚠}  x86 and sse2 Unpack and interleave 16bit integers from the high half of 
_mm_unpackhi_epi32^{⚠}  x86 and sse2 Unpack and interleave 32bit integers from the high half of 
_mm_unpackhi_epi64^{⚠}  x86 and sse2 Unpack and interleave 64bit integers from the high half of 
_mm_unpackhi_pd^{⚠}  x86 and sse2 The resulting 
_mm_unpackhi_ps^{⚠}  x86 and sse Unpack and interleave singleprecision (32bit) floatingpoint elements
from the higher half of 
_mm_unpacklo_epi8^{⚠}  x86 and sse2 Unpack and interleave 8bit integers from the low half of 
_mm_unpacklo_epi16^{⚠}  x86 and sse2 Unpack and interleave 16bit integers from the low half of 
_mm_unpacklo_epi32^{⚠}  x86 and sse2 Unpack and interleave 32bit integers from the low half of 
_mm_unpacklo_epi64^{⚠}  x86 and sse2 Unpack and interleave 64bit integers from the low half of 
_mm_unpacklo_pd^{⚠}  x86 and sse2 The resulting 
_mm_unpacklo_ps^{⚠}  x86 and sse Unpack and interleave singleprecision (32bit) floatingpoint elements
from the lower half of 
_mm_xor_pd^{⚠}  x86 and sse2 Compute the bitwise OR of 
_mm_xor_ps^{⚠}  x86 and sse Bitwise exclusive OR of packed singleprecision (32bit) floatingpoint elements. 
_mm_xor_si128^{⚠}  x86 and sse2 Compute the bitwise XOR of 128 bits (representing integer data) in 
_mulx_u32^{⚠}  x86 and bmi2 Unsigned multiply without affecting flags. 
_pdep_u32^{⚠}  x86 and bmi2 Scatter contiguous low order bits of 
_pext_u32^{⚠}  x86 and bmi2 Gathers the bits of 
_popcnt32^{⚠}  x86 and popcnt Counts the bits that are set. 
_rdrand16_step^{⚠}  x86 and rdrand Read a hardware generated 16bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdrand32_step^{⚠}  x86 and rdrand Read a hardware generated 32bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdseed16_step^{⚠}  x86 and rdseed Read a 16bit NIST SP80090B and SP80090C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdseed32_step^{⚠}  x86 and rdseed Read a 32bit NIST SP80090B and SP80090C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdtsc^{⚠}  x86 Reads the current value of the processor’s timestamp counter. 
_subborrow_u32^{⚠}  x86 Add unsigned 32bit integers a and b with unsigned 8bit carryin 
_t1mskc_u32^{⚠}  x86 and tbm Clears all bits below the least significant zero of 
_t1mskc_u64^{⚠}  x86 and tbm Clears all bits below the least significant zero of 
_tzcnt_u32^{⚠}  x86 and bmi1 Counts the number of trailing least significant zero bits. 
_tzmsk_u32^{⚠}  x86 and tbm Sets all bits below the least significant one of 
_tzmsk_u64^{⚠}  x86 and tbm Sets all bits below the least significant one of 
_xgetbv^{⚠}  x86 and xsave Reads the contents of the extended control register 
_xrstor^{⚠}  x86 and xsave Perform a full or partial restore of the enabled processor states using
the state information stored in memory at 
_xrstors^{⚠}  x86 and xsave,xsaves Perform a full or partial restore of the enabled processor states using the
state information stored in memory at 
_xsave^{⚠}  x86 and xsave Perform a full or partial save of the enabled processor states to memory at

_xsavec^{⚠}  x86 and xsave,xsavec Perform a full or partial save of the enabled processor states to memory
at 
_xsaveopt^{⚠}  x86 and xsave,xsaveopt Perform a full or partial save of the enabled processor states to memory at

_xsaves^{⚠}  x86 and xsave,xsaves Perform a full or partial save of the enabled processor states to memory at

_xsetbv^{⚠}  x86 and xsave Copy 64bits from 
_MM_SHUFFLE  Experimentalx86 A utility function for creating masks to use with Intel shuffle and permute intrinsics. 
_m_empty^{⚠}  Experimentalx86 and mmx Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures. 
_m_maskmovq^{⚠}  Experimentalx86 and sse,mmx Conditionally copies the values from each 8bit element in the first 64bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64bit integer vector operand. 
_m_paddb^{⚠}  Experimentalx86 and mmx Add packed 8bit integers in 
_m_paddd^{⚠}  Experimentalx86 and mmx Add packed 32bit integers in 
_m_paddsb^{⚠}  Experimentalx86 and mmx Add packed 8bit integers in 
_m_paddsw^{⚠}  Experimentalx86 and mmx Add packed 16bit integers in 
_m_paddusb^{⚠}  Experimentalx86 and mmx Add packed unsigned 8bit integers in 
_m_paddusw^{⚠}  Experimentalx86 and mmx Add packed unsigned 16bit integers in 
_m_paddw^{⚠}  Experimentalx86 and mmx Add packed 16bit integers in 
_m_pavgb^{⚠}  Experimentalx86 and sse,mmx Computes the rounded averages of the packed unsigned 8bit integer values and writes the averages to the corresponding bits in the destination. 
_m_pavgw^{⚠}  Experimentalx86 and sse,mmx Computes the rounded averages of the packed unsigned 16bit integer values and writes the averages to the corresponding bits in the destination. 
_m_pextrw^{⚠}  Experimentalx86 and sse,mmx Extracts 16bit element from a 64bit vector of 
_m_pinsrw^{⚠}  Experimentalx86 and sse,mmx Copies data from the 64bit vector of 
_m_pmaxsw^{⚠}  Experimentalx86 and sse,mmx Compares the packed 16bit signed integers of 
_m_pmaxub^{⚠}  Experimentalx86 and sse,mmx Compares the packed 8bit signed integers of 
_m_pminsw^{⚠}  Experimentalx86 and sse,mmx Compares the packed 16bit signed integers of 
_m_pminub^{⚠}  Experimentalx86 and sse,mmx Compares the packed 8bit signed integers of 
_m_pmovmskb^{⚠}  Experimentalx86 and sse,mmx Takes the most significant bit from each 8bit element in a 64bit integer vector to create a 16bit mask value. Zeroextends the value to 32bit integer and writes it to the destination. 
_m_pmulhuw^{⚠}  Experimentalx86 and sse,mmx Multiplies packed 16bit unsigned integer values and writes the highorder 16 bits of each 32bit product to the corresponding bits in the destination. 
_m_psadbw^{⚠}  Experimentalx86 and sse,mmx Subtracts the corresponding 8bit unsigned integer values of the two
64bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits 
_m_pshufw^{⚠}  Experimentalx86 and sse,mmx Shuffles the 4 16bit integers from a 64bit integer vector to the destination, as specified by the immediate value operand. 
_m_psubb^{⚠}  Experimentalx86 and mmx Subtract packed 8bit integers in 
_m_psubd^{⚠}  Experimentalx86 and mmx Subtract packed 32bit integers in 
_m_psubsb^{⚠}  Experimentalx86 and mmx Subtract packed 8bit integers in 
_m_psubsw^{⚠}  Experimentalx86 and mmx Subtract packed 16bit integers in 
_m_psubusb^{⚠}  Experimentalx86 and mmx Subtract packed unsigned 8bit integers in 
_m_psubusw^{⚠}  Experimentalx86 and mmx Subtract packed unsigned 16bit integers in 
_m_psubw^{⚠}  Experimentalx86 and mmx Subtract packed 16bit integers in 
_mm256_madd52hi_epu64^{⚠}  Experimentalx86 and avx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm256_madd52lo_epu64^{⚠}  Experimentalx86 and avx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_abs_epi32^{⚠}  Experimentalx86 and avx512f Computes the absolute values of packed 32bit integers in 
_mm512_madd52hi_epu64^{⚠}  Experimentalx86 and avx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_madd52lo_epu64^{⚠}  Experimentalx86 and avx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_mask_abs_epi32^{⚠}  Experimentalx86 and avx512f Compute the absolute value of packed 32bit integers in 
_mm512_maskz_abs_epi32^{⚠}  Experimentalx86 and avx512f Compute the absolute value of packed 32bit integers in 
_mm512_set1_epi64^{⚠}  Experimentalx86 and avx512f Broadcast 64bit integer 
_mm512_setr_epi32^{⚠}  Experimentalx86 and avx512f Set packed 32bit integers in 
_mm512_setzero_si512^{⚠}  Experimentalx86 and avx512f Return vector of type 
_mm_abs_pi8^{⚠}  Experimentalx86 and ssse3,mmx Compute the absolute value of packed 8bit integers in 
_mm_abs_pi16^{⚠}  Experimentalx86 and ssse3,mmx Compute the absolute value of packed 8bit integers in 
_mm_abs_pi32^{⚠}  Experimentalx86 and ssse3,mmx Compute the absolute value of packed 32bit integers in 
_mm_add_pi8^{⚠}  Experimentalx86 and mmx Add packed 8bit integers in 
_mm_add_pi16^{⚠}  Experimentalx86 and mmx Add packed 16bit integers in 
_mm_add_pi32^{⚠}  Experimentalx86 and mmx Add packed 32bit integers in 
_mm_add_si64^{⚠}  Experimentalx86 and sse2,mmx Adds two signed or unsigned 64bit integer values, returning the lower 64 bits of the sum. 
_mm_adds_pi8^{⚠}  Experimentalx86 and mmx Add packed 8bit integers in 
_mm_adds_pi16^{⚠}  Experimentalx86 and mmx Add packed 16bit integers in 
_mm_adds_pu8^{⚠}  Experimentalx86 and mmx Add packed unsigned 8bit integers in 
_mm_adds_pu16^{⚠}  Experimentalx86 and mmx Add packed unsigned 16bit integers in 
_mm_alignr_pi8^{⚠}  Experimentalx86 and ssse3,mmx Concatenates the two 64bit integer vector operands, and rightshifts the result by the number of bytes specified in the immediate operand. 
_mm_avg_pu8^{⚠}  Experimentalx86 and sse,mmx Computes the rounded averages of the packed unsigned 8bit integer values and writes the averages to the corresponding bits in the destination. 
_mm_avg_pu16^{⚠}  Experimentalx86 and sse,mmx Computes the rounded averages of the packed unsigned 16bit integer values and writes the averages to the corresponding bits in the destination. 
_mm_cmpgt_pi8^{⚠}  Experimentalx86 and mmx Compares whether each element of 
_mm_cmpgt_pi16^{⚠}  Experimentalx86 and mmx Compares whether each element of 
_mm_cmpgt_pi32^{⚠}  Experimentalx86 and mmx Compares whether each element of 
_mm_cvt_pi2ps^{⚠}  Experimentalx86 and sse,mmx Converts two elements of a 64bit vector of 
_mm_cvt_ps2pi^{⚠}  Experimentalx86 and sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvtpd_pi32^{⚠}  Experimentalx86 and sse2,mmx Converts the two doubleprecision floatingpoint elements of a
128bit vector of 
_mm_cvtpi8_ps^{⚠}  Experimentalx86 and sse,mmx Converts the lower 4 8bit values of 
_mm_cvtpi16_ps^{⚠}  Experimentalx86 and sse,mmx Converts a 64bit vector of 
_mm_cvtpi32_ps^{⚠}  Experimentalx86 and sse,mmx Converts two elements of a 64bit vector of 
_mm_cvtpi32_pd^{⚠}  Experimentalx86 and sse2,mmx Converts the two signed 32bit integer elements of a 64bit vector of

_mm_cvtpi32x2_ps^{⚠}  Experimentalx86 and sse,mmx Converts the two 32bit signed integer values from each 64bit vector
operand of 
_mm_cvtps_pi8^{⚠}  Experimentalx86 and sse,mmx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pi16^{⚠}  Experimentalx86 and sse,mmx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pi32^{⚠}  Experimentalx86 and sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvtpu8_ps^{⚠}  Experimentalx86 and sse,mmx Converts the lower 4 8bit values of 
_mm_cvtpu16_ps^{⚠}  Experimentalx86 and sse,mmx Converts a 64bit vector of 
_mm_cvtsi32_si64^{⚠}  Experimentalx86 and mmx Copy 32bit integer 
_mm_cvtsi64_si32^{⚠}  Experimentalx86 and mmx Return the lower 32bit integer in 
_mm_cvtt_ps2pi^{⚠}  Experimentalx86 and sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvttpd_pi32^{⚠}  Experimentalx86 and sse2,mmx Converts the two doubleprecision floatingpoint elements of a
128bit vector of 
_mm_cvttps_pi32^{⚠}  Experimentalx86 and sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_empty^{⚠}  Experimentalx86 and mmx Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures. 
_mm_extract_pi16^{⚠}  Experimentalx86 and sse,mmx Extracts 16bit element from a 64bit vector of 
_mm_hadd_pi16^{⚠}  Experimentalx86 and ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed
64bit vectors of 
_mm_hadd_pi32^{⚠}  Experimentalx86 and ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed
64bit vectors of 
_mm_hadds_pi16^{⚠}  Experimentalx86 and ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed
64bit vectors of 
_mm_hsub_pi16^{⚠}  Experimentalx86 and ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2
packed 64bit vectors of 
_mm_hsub_pi32^{⚠}  Experimentalx86 and ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2
packed 64bit vectors of 
_mm_hsubs_pi16^{⚠}  Experimentalx86 and ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2
packed 64bit vectors of 
_mm_insert_pi16^{⚠}  Experimentalx86 and sse,mmx Copies data from the 64bit vector of 
_mm_loadh_pi^{⚠}  Experimentalx86 and sse Set the upper two singleprecision floatingpoint values with 64 bits of
data loaded from the address 
_mm_loadl_pi^{⚠}  Experimentalx86 and sse Load two floats from 
_mm_madd52hi_epu64^{⚠}  Experimentalx86 and avx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm_madd52lo_epu64^{⚠}  Experimentalx86 and avx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm_maddubs_pi16^{⚠}  Experimentalx86 and ssse3,mmx Multiplies corresponding pairs of packed 8bit unsigned integer values contained in the first source operand and packed 8bit signed integer values contained in the second source operand, adds pairs of contiguous products with signed saturation, and writes the 16bit sums to the corresponding bits in the destination. 
_mm_maskmove_si64^{⚠}  Experimentalx86 and sse,mmx Conditionally copies the values from each 8bit element in the first 64bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64bit integer vector operand. 
_mm_max_pi16^{⚠}  Experimentalx86 and sse,mmx Compares the packed 16bit signed integers of 
_mm_max_pu8^{⚠}  Experimentalx86 and sse,mmx Compares the packed 8bit signed integers of 
_mm_min_pi16^{⚠}  Experimentalx86 and sse,mmx Compares the packed 16bit signed integers of 
_mm_min_pu8^{⚠}  Experimentalx86 and sse,mmx Compares the packed 8bit signed integers of 
_mm_movemask_pi8^{⚠}  Experimentalx86 and sse,mmx Takes the most significant bit from each 8bit element in a 64bit integer vector to create a 16bit mask value. Zeroextends the value to 32bit integer and writes it to the destination. 
_mm_movepi64_pi64^{⚠}  Experimentalx86 and sse2,mmx Returns the lower 64 bits of a 128bit integer vector as a 64bit integer. 
_mm_movpi64_epi64^{⚠}  Experimentalx86 and sse2,mmx Moves the 64bit operand to a 128bit integer vector, zeroing the upper bits. 
_mm_mul_su32^{⚠}  Experimentalx86 and sse2,mmx Multiplies 32bit unsigned integer values contained in the lower bits of the two 64bit integer vectors and returns the 64bit unsigned product. 
_mm_mulhi_pu16^{⚠}  Experimentalx86 and sse,mmx Multiplies packed 16bit unsigned integer values and writes the highorder 16 bits of each 32bit product to the corresponding bits in the destination. 
_mm_mulhrs_pi16^{⚠}  Experimentalx86 and ssse3,mmx Multiplies packed 16bit signed integer values, truncates the 32bit
products to the 18 most significant bits by rightshifting, rounds the
truncated value by adding 1, and writes bits 
_mm_mullo_pi16^{⚠}  Experimentalx86 and sse,mmx Multiplies packed 16bit integer values and writes the loworder 16 bits of each 32bit product to the corresponding bits in the destination. 
_mm_packs_pi16^{⚠}  Experimentalx86 and mmx Convert packed 16bit integers from 
_mm_packs_pi32^{⚠}  Experimentalx86 and mmx Convert packed 32bit integers from 
_mm_sad_pu8^{⚠}  Experimentalx86 and sse,mmx Subtracts the corresponding 8bit unsigned integer values of the two
64bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits 
_mm_set1_epi64^{⚠}  Experimentalx86 and sse2,mmx Initializes both values in a 128bit vector of 
_mm_set1_pi8^{⚠}  Experimentalx86 and mmx Broadcast 8bit integer a to all all elements of dst. 
_mm_set1_pi16^{⚠}  Experimentalx86 and mmx Broadcast 16bit integer a to all all elements of dst. 
_mm_set1_pi32^{⚠}  Experimentalx86 and mmx Broadcast 32bit integer a to all all elements of dst. 
_mm_set_epi64^{⚠}  Experimentalx86 and sse2,mmx Initializes both 64bit values in a 128bit vector of 
_mm_set_pi8^{⚠}  Experimentalx86 and mmx Set packed 8bit integers in dst with the supplied values. 
_mm_set_pi16^{⚠}  Experimentalx86 and mmx Set packed 16bit integers in dst with the supplied values. 
_mm_set_pi32^{⚠}  Experimentalx86 and mmx Set packed 32bit integers in dst with the supplied values. 
_mm_setr_epi64^{⚠}  Experimentalx86 and sse2,mmx Constructs a 128bit integer vector, initialized in reverse order with the specified 64bit integral values. 
_mm_setr_pi8^{⚠}  Experimentalx86 and mmx Set packed 8bit integers in dst with the supplied values in reverse order. 
_mm_setr_pi16^{⚠}  Experimentalx86 and mmx Set packed 16bit integers in dst with the supplied values in reverse order. 
_mm_setr_pi32^{⚠}  Experimentalx86 and mmx Set packed 32bit integers in dst with the supplied values in reverse order. 
_mm_setzero_si64^{⚠}  Experimentalx86 and mmx Constructs a 64bit integer vector initialized to zero. 
_mm_shuffle_pi8^{⚠}  Experimentalx86 and ssse3,mmx Shuffle packed 8bit integers in 
_mm_shuffle_pi16^{⚠}  Experimentalx86 and sse,mmx Shuffles the 4 16bit integers from a 64bit integer vector to the destination, as specified by the immediate value operand. 
_mm_sign_pi8^{⚠}  Experimentalx86 and ssse3,mmx Negate packed 8bit integers in 
_mm_sign_pi16^{⚠}  Experimentalx86 and ssse3,mmx Negate packed 16bit integers in 
_mm_sign_pi32^{⚠}  Experimentalx86 and ssse3,mmx Negate packed 32bit integers in 
_mm_storeh_pi^{⚠}  Experimentalx86 and sse Store the upper half of 
_mm_storel_pi^{⚠}  Experimentalx86 and sse Store the lower half of 
_mm_stream_pi^{⚠}  Experimentalx86 and sse,mmx Store 64bits of integer data from a into memory using a nontemporal memory hint. 
_mm_sub_pi8^{⚠}  Experimentalx86 and mmx Subtract packed 8bit integers in 
_mm_sub_pi16^{⚠}  Experimentalx86 and mmx Subtract packed 16bit integers in 
_mm_sub_pi32^{⚠}  Experimentalx86 and mmx Subtract packed 32bit integers in 
_mm_sub_si64^{⚠}  Experimentalx86 and sse2,mmx Subtracts signed or unsigned 64bit integer values and writes the difference to the corresponding bits in the destination. 
_mm_subs_pi8^{⚠}  Experimentalx86 and mmx Subtract packed 8bit integers in 
_mm_subs_pi16^{⚠}  Experimentalx86 and mmx Subtract packed 16bit integers in 
_mm_subs_pu8^{⚠}  Experimentalx86 and mmx Subtract packed unsigned 8bit integers in 
_mm_subs_pu16^{⚠}  Experimentalx86 and mmx Subtract packed unsigned 16bit integers in 
_mm_unpackhi_pi8^{⚠}  Experimentalx86 and mmx Unpacks the upper four elements from two 
_mm_unpackhi_pi16^{⚠}  Experimentalx86 and mmx Unpacks the upper two elements from two 
_mm_unpackhi_pi32^{⚠}  Experimentalx86 and mmx Unpacks the upper element from two 
_mm_unpacklo_pi8^{⚠}  Experimentalx86 and mmx Unpacks the lower four elements from two 
_mm_unpacklo_pi16^{⚠}  Experimentalx86 and mmx Unpacks the lower two elements from two 
_mm_unpacklo_pi32^{⚠}  Experimentalx86 and mmx Unpacks the lower element from two 
has_cpuid  Experimentalx86 Does the host support the 
ud2^{⚠}  Experimentalx86 Generates the trap instruction 
Type Definitions
__mmask16  Experimentalx86 The 