Warning, /multimedia/kdenlive/src/lib/external/kiss_fft/README.simd is written in an unsupported language. File is not indexed.
0001 If you are reading this, it means you think you may be interested in using the SIMD extensions within kissfft. 0002 0003 Beware! Beyond here there be dragons! 0004 0005 This API is not easy to use, is not well documented, and breaks the KISS principle. 0006 0007 0008 Still reading? Okay, you may get rewarded for your patience with a considerable speedup 0009 (2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops. 0010 0011 The basic idea is to use the packed 4 float __m128 data type as a scalar element. 0012 This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D. 0013 0014 For complex data, the data is interlaced as follows: 0015 rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ... 0016 where "rA0" is the real part of the zeroth sample for signal A 0017 0018 Real-only data is laid out: 0019 rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ... 0020 0021 Compile with gcc flags something like 0022 -O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse 0023 0024 Be aware of SIMD alignment. This is the most likely cause of segfaults. 0025 The code within kissfft uses scratch variables on the stack. 0026 With SIMD, these must have addresses on 16 byte boundaries. 0027 Search on "SIMD alignment" for more info. 0028 0029 0030 0031 Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft. 0032 I have not run it -- use it at your own risk. 0033 0034 void SSETools::pack128(float* target, float* source, unsigned long size128) 0035 { 0036 __m128* pDest = (__m128*)target; 0037 __m128* pDestEnd = pDest+size128; 0038 float* source0=source; 0039 float* source1=source0+size128; 0040 float* source2=source1+size128; 0041 float* source3=source2+size128; 0042 0043 while(pDest<pDestEnd) 0044 { 0045 *pDest=_mm_set_ps(*source3,*source2,*source1,*source0); 0046 source0++; 0047 source1++; 0048 source2++; 0049 source3++; 0050 pDest++; 0051 } 0052 } 0053 0054 void SSETools::unpack128(float* target, float* source, unsigned long size128) 0055 { 0056 0057 float* pSrc = source; 0058 float* pSrcEnd = pSrc+size128*4; 0059 float* target0=target; 0060 float* target1=target0+size128; 0061 float* target2=target1+size128; 0062 float* target3=target2+size128; 0063 0064 while(pSrc<pSrcEnd) 0065 { 0066 *target0=pSrc[0]; 0067 *target1=pSrc[1]; 0068 *target2=pSrc[2]; 0069 *target3=pSrc[3]; 0070 target0++; 0071 target1++; 0072 target2++; 0073 target3++; 0074 pSrc+=4; 0075 } 0076 }