Warning, /multimedia/kdenlive/src/lib/external/kiss_fft/README.simd is written in an unsupported language. File is not indexed.

0001 If you are reading this, it means you think you may be interested in using the SIMD extensions within kissfft.
0002 
0003 Beware! Beyond here there be dragons!
0004 
0005 This API is not easy to use, is not well documented, and breaks the KISS principle.  
0006 
0007 
0008 Still reading? Okay, you may get rewarded for your patience with a considerable speedup 
0009 (2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops.
0010 
0011 The basic idea is to use the packed 4 float __m128 data type as a scalar element.  
0012 This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D.
0013 
0014 For complex data, the data is interlaced as follows:
0015 rA0,rB0,rC0,rD0,      iA0,iB0,iC0,iD0,   rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ...
0016 where "rA0" is the real part of the zeroth sample for signal A
0017 
0018 Real-only data is laid out:
0019 rA0,rB0,rC0,rD0,     rA1,rB1,rC1,rD1,      ... 
0020 
0021 Compile with gcc flags something like
0022 -O3 -mpreferred-stack-boundary=4  -DUSE_SIMD=1 -msse 
0023 
0024 Be aware of SIMD alignment.  This is the most likely cause of segfaults.  
0025 The code within kissfft uses scratch variables on the stack.  
0026 With SIMD, these must have addresses on 16 byte boundaries.  
0027 Search on "SIMD alignment" for more info.
0028 
0029 
0030 
0031 Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft.  
0032 I have not run it -- use it at your own risk.
0033 
0034 void SSETools::pack128(float* target, float* source, unsigned long size128)
0035 {
0036    __m128* pDest = (__m128*)target;
0037    __m128* pDestEnd = pDest+size128;
0038    float* source0=source;
0039    float* source1=source0+size128;
0040    float* source2=source1+size128;
0041    float* source3=source2+size128;
0042 
0043    while(pDest<pDestEnd)
0044    {
0045        *pDest=_mm_set_ps(*source3,*source2,*source1,*source0);
0046        source0++;
0047        source1++;
0048        source2++;
0049        source3++;
0050        pDest++;
0051    }
0052 }
0053 
0054 void SSETools::unpack128(float* target, float* source, unsigned long size128)
0055 {
0056 
0057    float* pSrc = source;
0058    float* pSrcEnd = pSrc+size128*4;
0059    float* target0=target;
0060    float* target1=target0+size128;
0061    float* target2=target1+size128;
0062    float* target3=target2+size128;
0063 
0064    while(pSrc<pSrcEnd)
0065    {
0066        *target0=pSrc[0];
0067        *target1=pSrc[1];
0068        *target2=pSrc[2];
0069        *target3=pSrc[3];
0070        target0++;
0071        target1++;
0072        target2++;
0073        target3++;
0074        pSrc+=4;
0075    }
0076 }