← all posts

Linear Transformations and the Scaling of Lebesgue Measure

2025-04-06·18 min read·

How the determinant controls volume distortion, and why translation invariance uniquely characterizes Lebesgue measure up to a scalar.

The problem

Throughout this article, E|E| denotes the outer Lebesgue measure of a set EE in its ambient Euclidean space. When EE is a rectangle, ball, or other elementary region, this agrees with its usual volume. Thus the same notation is used whether or not the set has already been shown to be measurable.

Let Φ:RnRn\Phi: \mathbb{R}^n \to \mathbb{R}^n be a linear map and ERnE \subseteq \mathbb{R}^n be any set. To determine Φ(E)|\Phi(E)|, we need to investigate how Φ\Phi affects the measure of EE. One intuitive conclusion is that

Φ(E)=detΦE.|\Phi(E)| = |\det\Phi||E|.

Its first origin is not measure theory, but the geometric meaning of the determinant.

From the geometric point of view, if v1,,vnRnv_1,\ldots,v_n\in\mathbb{R}^n, then det(v1,,vn)|\det(v_1,\ldots,v_n)| is the nn-dimensional volume of the parallelepiped spanned by these vectors. Thus, if A:RnRnA:\mathbb{R}^n\to\mathbb{R}^n is linear, then AA sends the unit cube to the parallelepiped spanned by the column vectors of AA, and the volume of that parallelepiped is detA|\det A|.

The determinant as the volume of a parallelepiped

The determinant records the signed volume distortion of a linear map. Image: Claudio Rocchini, Wikimedia Commons, CC BY 3.0/GFDL.

This is the finite-dimensional geometric core of the theorem.

For rectangles, parallelepipeds, and simple polyhedral regions, the determinant already explains volume distortion. The deeper question is whether the same formula remains valid for an arbitrary set ERnE\subseteq\mathbb{R}^n:

A(E)=detAE.|A(E)|=|\det A|\,|E|.

This is no longer merely a problem of linear algebra. It requires a theory of volume that applies to irregular sets. This is exactly the role of Lebesgue measure. Lebesgue's measure-theoretic viewpoint makes it possible to assign volume not only to elementary regions, but also to much more complicated subsets of Euclidean space.

Therefore, the theorem can be read as follows: the classical determinant formula for parallelepipeds extends to all subsets of Euclidean space, once volume is interpreted as outer Lebesgue measure.

However, why does this make sense? We will demonstrate this fact in detail using two proofs: one extensively relies on analysis computation, and one is more elegant when we just want a few properties from the algebra of measure.

First Proof

Lemma

Suppose Φ:RnRn\Phi: \mathbb{R}^n \to \mathbb{R}^n is a Lipschitz mapping. If ERnE \subseteq \mathbb{R}^n is Lebesgue measurable, then Φ(E)\Phi(E) is measurable.

Proof.

Step 1. If E=0|E| = 0, then Φ(E)=0|\Phi(E)| = 0. Let ε>0\varepsilon > 0 be arbitrary, there exist cubes {Ci}\{C_i\} covering EE such that

iCi<2nωnnCε,\sum_{i} |C_i| < \frac{2^n}{\omega_n \sqrt{n} C}\varepsilon,

where ωn=πn/2Γ(n/2+1)\omega_n = \frac{\pi^{n/2}}{\Gamma(n/2+1)} is the volume of the unit ball B(0,1)RnB(0,1) \subseteq \mathbb{R}^n and Ci=[ai,bi]nC_i = [a_i, b_i]^n. We have

diamCi=n(biai)2=n(biai).\operatorname{diam} C_i = \sqrt{n(b_i - a_i)^2} = \sqrt{n}(b_i - a_i).

From the definition of diameter,

diamΦ(Ci)=sup{xyx,yΦ(Ci)}=sup{Φ(x)Φ(y)x,yCi}Cdiam(Ci)=nC(biai).\begin{aligned} \operatorname{diam} \Phi(C_i) &= \sup\{\|x - y\| \mid x,y \in \Phi(C_i)\} \\ &= \sup\{\|\Phi(x) - \Phi(y)\| \mid x,y \in C_i\} \\ &\leq C \operatorname{diam}(C_i) = \sqrt{n}C(b_i - a_i). \end{aligned}

Since Φ(Ci)\Phi(C_i) is contained in a ball of radius diamCi2\frac{\operatorname{diam} C_i}{2},

Φ(Ci)ωn(diam(Ci)2)nωnnC2n(biai)n=ωnnC2nCi.\begin{aligned} |\Phi(C_i)| &\leq \omega_n \left(\frac{\operatorname{diam}(C_i)}{2}\right)^n \\ &\leq \frac{\omega_n \sqrt{n} C}{2^n}(b_i - a_i)^n = \frac{\omega_n \sqrt{n} C}{2^n}|C_i|. \end{aligned}

Since {Φ(Ci)}\{\Phi(C_i)\} covers Φ(E)\Phi(E),

Φ(E)iΦ(Ci)ωnnC2niCiε.\begin{aligned} |\Phi(E)| &\leq \sum_i |\Phi(C_i)| \\ &\leq \frac{\omega_n \sqrt{n} C}{2^n} \sum_i |C_i| \\ &\leq \varepsilon. \end{aligned}

Since ε\varepsilon is arbitrary, Φ(E)=0|\Phi(E)| = 0.

Step 2. If FRnF \subseteq \mathbb{R}^n is compact, then Φ(F)\Phi(F) is compact.

Let (xn)Φ(F)(x_n) \subseteq \Phi(F) such that xnxRnx_n \to x \in \mathbb{R}^n. For each nn, choose ynFy_n \in F such that Φ(yn)=xn\Phi(y_n) = x_n. Since FF is compact, there is a subsequence (ynk)(y_{n_k}) converging to yFy \in F. Then xnkΦ(y)Φ(F)x_{n_k} \to \Phi(y) \in \Phi(F). As the limit is unique, x=Φ(y)Φ(F)x = \Phi(y) \in \Phi(F). Hence Φ(F)\Phi(F) is compact.

Step 3. Let HH be an FσF_\sigma set in Rn\mathbb{R}^n; then Φ(H)\Phi(H) is an FσF_\sigma set in Rn\mathbb{R}^n.

Write H=i=1FiH = \bigcup_{i=1}^\infty F_i, where FiF_i are closed. For each ii, define Kij=FiB(0,j)K_{ij} = F_i \cap \overline{B(0,j)}, so that

Fi=j=1Kij.F_i = \bigcup_{j=1}^\infty K_{ij}.

Since FiF_i is closed, KijK_{ij} is a closed subset of the compact set B(0,j)\overline{B(0,j)}, hence compact. It follows that Φ(Kij)\Phi(K_{ij}) is compact for all i,ji,j, and one can write

Φ(H)=i,j=1Φ(Kij).\Phi(H) = \bigcup_{i,j=1}^\infty \Phi(K_{ij}).

Therefore Φ(H)\Phi(H) is FσF_\sigma.

Step 4. Let AA be measurable. By inner regularity of Lebesgue measure,

A=sup{KKA, K compact}.|A| = \sup\{|K| \mid K \subseteq A,\ K \text{ compact}\}.

For each nn, choose a compact subset KnAK_n \subseteq A such that

Kn>A1n.|K_n| > |A| - \frac{1}{n}.

Set H=n=1KnH = \bigcup_{n=1}^\infty K_n; then HH is FσF_\sigma. Since HAH \subseteq A, one has HA|H| \leq |A|, and HKnA1n|H| \geq |K_n| \geq |A| - \frac{1}{n} for all nn, so H=A|H| = |A|. Then AH=0|A \setminus H| = 0, so N=AHN = A \setminus H is a null set.

Step 5. Write E=HNE = H \cup N, where HH is FσF_\sigma and NN is a null set. Let ARnA \subseteq \mathbb{R}^n be any subset. Since Φ(H)Φ(E)\Phi(H) \subseteq \Phi(E), we have Φ(E)cΦ(H)c\Phi(E)^c \subseteq \Phi(H)^c. Since HH is FσF_\sigma, it follows from Step 3 that Φ(H)\Phi(H) is FσF_\sigma, hence Borel and therefore measurable. We have the estimate

Φ(E)cAΦ(H)cA.|\Phi(E)^c \cap A| \leq |\Phi(H)^c \cap A|.

By Step 1, Φ\Phi maps null sets to null sets, so Φ(N)=0|\Phi(N)| = 0. Then

Φ(E)A=[Φ(H)Φ(N)]AΦ(H)A+Φ(N)AΦ(H)A+Φ(N)=Φ(H)A.\begin{aligned} |\Phi(E) \cap A| &= |[\Phi(H) \cup \Phi(N)] \cap A| \\ &\leq |\Phi(H) \cap A| + |\Phi(N) \cap A| \\ &\leq |\Phi(H) \cap A| + |\Phi(N)| \\ &= |\Phi(H) \cap A|. \end{aligned}

Since Φ(H)\Phi(H) is measurable, combining yields

Φ(E)A+Φ(E)cAΦ(H)A+Φ(H)cA=A.\begin{aligned} |\Phi(E) \cap A| + |\Phi(E)^c \cap A| &\leq |\Phi(H) \cap A| + |\Phi(H)^c \cap A| \\ &= |A|. \end{aligned}

The reverse inequality follows by monotonicity of outer measure. Hence Φ(E)\Phi(E) satisfies the Carathéodory condition.

Now let A:RnRnA: \mathbb{R}^n \to \mathbb{R}^n be any linear map. We show AA is Lipschitz. Writing x=i=1nxieix = \sum_{i=1}^n x_i e^i,

Ax=i=1nxiAeii=1nxiAei.\|Ax\| = \left\|\sum_{i=1}^n x_i Ae^i\right\| \leq \sum_{i=1}^n |x_i| \|Ae^i\|.

Applying Cauchy–Schwarz,

Ax(iAei2)1/2(ixi2)1/2=Cx,\|Ax\| \leq \left(\sum_i \|Ae^i\|^2\right)^{1/2} \left(\sum_i x_i^2\right)^{1/2} = C\|x\|,

where C=(iAei2)1/2<C = \left(\sum_i \|Ae^i\|^2\right)^{1/2} < \infty. Thus AA is Lipschitz.

We state the SVD theorem, to be proved later using the spectral theorem.

Theorem (SVD)

Let AMm×n(R)A \in M_{m \times n}(\mathbb{R}). Then there exist UO(m)U \in O(m), VO(n)V \in O(n), and a diagonal matrix Σ\Sigma with nonnegative entries such that A=UΣVTA = U\Sigma V^T.

Singular value decomposition as rotations and scaling

SVD decomposes a linear map into rotations and coordinate-axis scaling. Image: Georg-Johann, Wikimedia Commons, CC BY-SA 3.0/GFDL.

Historically, this decomposition was discovered independently by Eugenio Beltrami in 1873 and Camille Jordan in 1874 in the context of bilinear forms. James Joseph Sylvester later arrived at a related decomposition for real square matrices and called the singular values the canonical multipliers of the matrix. In the twentieth century, the theory was extended and connected with integral operators by Schmidt and Weyl, while Eckart and Young made the decomposition central to low-rank approximation.

With this theorem, it suffices to verify the result when Φ\Phi is a translation, rotation, or diagonal (scaling) matrix.

Case 1: Φ\Phi is a rotation, i.e. Φ{A:RnRnAA=In}\Phi \in \{A: \mathbb{R}^n \to \mathbb{R}^n \mid A \cdot A^* = I_n\}.

Step 1: Φ\Phi is an isometry. For x,yRnx, y \in \mathbb{R}^n,

Φx,Φy=(Φx)T(Φy)=xTΦTΦy=xTy=x,y.\langle \Phi x, \Phi y \rangle = (\Phi x)^T(\Phi y) = x^T \Phi^T \Phi y = x^T y = \langle x, y \rangle.

This implies Φx=x\|\Phi x\| = \|x\|. Since Φ\Phi is invertible, Φ1O(n)\Phi^{-1} \in O(n) and thus Φ1\Phi^{-1} is also an isometry.

Step 2: Φ(B(xi,r))=B(Φ(xi),r)\Phi(B(x_i, r)) = B(\Phi(x_i), r) and covers are preserved. Since Φ\Phi is invertible,

Φ(B(xi,r))={Φxxxi<r}={xΦ1(x)xi<r}={xxΦxi<r}=B(Φ(xi),r).\begin{aligned} \Phi(B(x_i, r)) &= \{\Phi x \mid \|x - x_i\| < r\} \\ &= \{x \mid \|\Phi^{-1}(x) - x_i\| < r\} \\ &= \{x \mid \|x - \Phi x_i\| < r\} \\ &= B(\Phi(x_i), r). \end{aligned}

It follows that {B(xi,r)}\{B(x_i, r)\} covers EE if and only if {B(Φ(xi),r)}\{B(\Phi(x_i), r)\} covers Φ(E)\Phi(E).

Step 3: Φ(E)=E|\Phi(E)| = |E|. Let ε>0\varepsilon > 0. Choose open balls {B(xi,r1)}\{B(x_i, r_1)\} covering EE such that iB(xi,r1)<E+ε\sum_i |B(x_i, r_1)| < |E| + \varepsilon. Then {B(Φ(xi),r1)}\{B(\Phi(x_i), r_1)\} covers Φ(E)\Phi(E). Since Φ\Phi is an isometry, B(Φ(xi),r1)=B(xi,r1)|B(\Phi(x_i), r_1)| = |B(x_i, r_1)|, so

Φ(E)iB(Φ(xi),r1)=iB(xi,r1)E+ε.\begin{aligned} |\Phi(E)| &\leq \sum_i |B(\Phi(x_i), r_1)| \\ &= \sum_i |B(x_i, r_1)| \\ &\leq |E| + \varepsilon. \end{aligned}

Now choose open balls {B(yi,r2)}\{B(y_i, r_2)\} covering Φ(E)\Phi(E) such that iB(yi,r2)<Φ(E)+ε\sum_i |B(y_i, r_2)| < |\Phi(E)| + \varepsilon. Then {B(Φ1(yi),r2)}\{B(\Phi^{-1}(y_i), r_2)\} covers EE. Since Φ1O(n)\Phi^{-1} \in O(n) is also an isometry,

EiB(Φ1(yi),r2)=iB(yi,r2)<Φ(E)+ε.\begin{aligned} |E| &\leq \sum_i |B(\Phi^{-1}(y_i), r_2)| \\ &= \sum_i |B(y_i, r_2)| \\ &< |\Phi(E)| + \varepsilon. \end{aligned}

Since ε\varepsilon is arbitrary, Φ(E)=E|\Phi(E)| = |E|.

Case 2: Φ=diag[σ1,,σn]\Phi = \operatorname{diag}[\sigma_1, \dots, \sigma_n]. Let R=i=1n(ai,bi]R = \prod_{i=1}^n (a_i, b_i] be any rectangle. Then

Φ(R)=i=1n(σiai,σibi].\Phi(R) = \prod_{i=1}^n (\sigma_i a_i, \sigma_i b_i].

One can assume σi>0\sigma_i > 0: if σi=0\sigma_i = 0 the image collapses to lower dimension giving Φ(E)=0|\Phi(E)| = 0, and σi<0\sigma_i < 0 merely flips and scales RR without affecting the measure calculation. Since Φ(R)\Phi(R) is a rectangle,

Φ(R)=i=1n(σibiσiai)=detΦR.|\Phi(R)| = \prod_{i=1}^n (\sigma_i b_i - \sigma_i a_i) = |\det \Phi| \cdot |R|.

Since Φ\Phi is bijective, {Ri}\{R_i\} covers EE if and only if {Φ(Ri)}\{\Phi(R_i)\} covers Φ(E)\Phi(E). Therefore

Φ(E)=inf{Ri  |  {Ri} covers Φ(E)}=inf{detΦΦ1(Ri)  |  {Φ1(Ri)} covers E}=detΦE.\begin{aligned} |\Phi(E)| &= \inf\left\{\sum |R_i| \;\middle|\; \{R_i\} \text{ covers } \Phi(E)\right\} \\ &= \inf\left\{\sum |\det\Phi| \cdot |\Phi^{-1}(R_i)| \;\middle|\; \{\Phi^{-1}(R_i)\} \text{ covers } E\right\} \\ &= |\det\Phi| \cdot |E|. \end{aligned}

Case 3: Φ(x)=x+x0\Phi(x) = x + x_0, where x0=(x1,,xn)Rnx_0 = (x_1, \dots, x_n) \in \mathbb{R}^n. Let R=i=1n(ai,bi]R = \prod_{i=1}^n (a_i, b_i] be any rectangle. Then Φ(R)=i=1n(ai+xi,bi+xi]\Phi(R) = \prod_{i=1}^n (a_i + x_i, b_i + x_i], and since Φ(R)\Phi(R) is a rectangle,

Φ(R)=i=1n(bi+xiaixi)=R.|\Phi(R)| = \prod_{i=1}^n (b_i + x_i - a_i - x_i) = |R|.

Since Φ\Phi is bijective, {Ri}\{R_i\} covers EE if and only if {Φ(Ri)}\{\Phi(R_i)\} covers Φ(E)\Phi(E). Therefore Φ(E)=E|\Phi(E)| = |E|.

For any linear map Φ:RnRn\Phi: \mathbb{R}^n \to \mathbb{R}^n, the SVD theorem gives Φ=UΣVT\Phi = U\Sigma V^T with U,VO(n)U, V \in O(n) and Σ\Sigma diagonal. Let ERnE \subseteq \mathbb{R}^n be arbitrary. Since U,VU, V are isometries,

Φ(E)=UΣVT(E)=ΣVT(E)=detΣVT(E)=detΣE.\begin{aligned} |\Phi(E)| &= |U\Sigma V^T(E)| \\ &= |\Sigma V^T(E)| \\ &= |\det\Sigma| \cdot |V^T(E)| \\ &= |\det\Sigma| \cdot |E|. \end{aligned}

Since detU=detV=1|\det U| = |\det V| = 1, we have detΦ=detΣ|\det\Phi| = |\det\Sigma|. Hence

Φ(E)=detΦE.\boxed{|\Phi(E)| = |\det\Phi| \cdot |E|.}

Second Proof

A more standard way to prove such a result is to begin with a small class of sets where the formula is transparent, and then extend it to a larger σ\sigma-algebra.

For example, one first verifies the formula on half-open rectangles or cubes. These sets generate the Borel σ\sigma-algebra of Rn\mathbb{R}^n. Measure theory then supplies extension tools which allow a statement known on the generating class to be promoted first to Borel sets and then, by the outer measure definition, to all subsets.

This is the conceptual role of the Carathéodory extension principle and related monotone-class or π\pi-λ\lambda arguments. In this section, we need to define some special algebras that work quite effectively on measure and will prove one theorem and two lemmas in total.

A nonempty collection of subsets P2X\mathcal{P} \subset 2^X is a π\pi-system if

A,BPABP.A, B \in \mathcal{P} \quad \Rightarrow \quad A \cap B \in \mathcal{P}.

A collection of subsets L2X\mathcal{L} \subseteq 2^X is a λ\lambda-system if:

  1. XLX \in \mathcal{L}.
  2. A,BLA, B \in \mathcal{L} and ABA \subseteq B implies BALB \setminus A \in \mathcal{L}.
  3. If {Ak}L\{A_k\} \subseteq \mathcal{L} and AkAk+1A_k \subseteq A_{k+1} for all kk, then k=1AkL\bigcup_{k=1}^\infty A_k \in \mathcal{L}.
Theorem (pi-lambda Theorem)

If P\mathcal{P} is a π\pi-system and L\mathcal{L} is a λ\lambda-system with PL\mathcal{P} \subseteq \mathcal{L}, then σ(P)L\sigma(\mathcal{P}) \subseteq \mathcal{L}.

Proof.

Let

S=LPL is a λ-systemL\mathcal{S} = \bigcap_{\substack{ \mathcal{L}' \supseteq \mathcal{P} \\ \mathcal{L}' \text{ is a } \lambda\text{-system} }} \mathcal{L}'

be the smallest λ\lambda-system containing P\mathcal{P}. Then PSL\mathcal{P} \subseteq \mathcal{S} \subseteq \mathcal{L}, and S\mathcal{S} is itself a λ\lambda-system by construction.

Claim: S\mathcal{S} is a π\pi-system.

Fix any ASA \in \mathcal{S} and define A={CXACS}\mathcal{A} = \{C \subseteq X \mid A \cap C \in \mathcal{S}\}. One can verify directly that A\mathcal{A} is a λ\lambda-system.

Step 1: First take APA \in \mathcal{P}. For any PPP \in \mathcal{P}, since P\mathcal{P} is a π\pi-system, APPSA \cap P \in \mathcal{P} \subseteq \mathcal{S}, so PAP \in \mathcal{A}. Hence PA\mathcal{P} \subseteq \mathcal{A}, and since S\mathcal{S} is the smallest λ\lambda-system containing P\mathcal{P}, we get SA\mathcal{S} \subseteq \mathcal{A}. This means ACSA \cap C \in \mathcal{S} for all APA \in \mathcal{P} and all CSC \in \mathcal{S}.

Step 2: Now take any ASA \in \mathcal{S}. By Step 1, for any PPP \in \mathcal{P}, APSA \cap P \in \mathcal{S}, so PAP \in \mathcal{A}. Hence PA\mathcal{P} \subseteq \mathcal{A}, and again SA\mathcal{S} \subseteq \mathcal{A}. In particular, for any BSB \in \mathcal{S}, BAB \in \mathcal{A}, which means ABSA \cap B \in \mathcal{S}.

Hence S\mathcal{S} is closed under finite intersections, i.e., a π\pi-system.

S\mathcal{S} is a σ\sigma-algebra. Since S\mathcal{S} is a λ\lambda-system, XSX \in \mathcal{S} and XX=SX \setminus X = \varnothing \in \mathcal{S}. If ASA \in \mathcal{S}, then Ac=XASA^c = X \setminus A \in \mathcal{S}. For countable unions: given {Ak}S\{A_k\} \subseteq \mathcal{S}, set Bk=A1AkB_k = A_1 \cup \cdots \cup A_k. Since S\mathcal{S} is a π\pi-system, it is closed under finite unions (by De Morgan and closure under complements and finite intersections), so BkSB_k \in \mathcal{S}. Since BkkAkB_k \nearrow \bigcup_k A_k and S\mathcal{S} is a λ\lambda-system, kAkS\bigcup_k A_k \in \mathcal{S}. Hence S\mathcal{S} is a σ\sigma-algebra.

Therefore σ(P)σ(S)=SL\sigma(\mathcal{P}) \subseteq \sigma(\mathcal{S}) = \mathcal{S} \subseteq \mathcal{L}.

Historically, this result is closely associated with Eugene Dynkin and is also known under the name Sierpiński-Dynkin theorem.

Lemma

Let ARnA \subseteq \mathbb{R}^n be a compact subset whose intersection with {c}×Rn1\{c\} \times \mathbb{R}^{n-1} has (n1)(n-1)-dimensional measure zero for every cRc \in \mathbb{R}. Then AA has measure zero.

Proof.

Since AA is compact, there exists a closed interval [a,b][a,b] such that A[a,b]×Rn1A \subseteq [a,b] \times \mathbb{R}^{n-1}. For each c[a,b]c \in [a,b], denote Ac={xRn1(c,x)A}A_c = \{x \in \mathbb{R}^{n-1} \mid (c,x) \in A\}.

Let ε>0\varepsilon > 0 be arbitrary. Since Ac=0|A_c| = 0 in Rn1\mathbb{R}^{n-1}, there exist finitely many (n1)(n-1)-dimensional open cubes {C1,,Ck}\{C_1, \dots, C_k\} covering AcA_c with iCi<ε\sum_i |C_i| < \varepsilon. Set Uc=C1CkU_c = C_1 \cup \cdots \cup C_k.

Claim: There exists an open interval JccJ_c \ni c such that A(Jc×Rn1)Jc×UcA \cap (J_c \times \mathbb{R}^{n-1}) \subseteq J_c \times U_c.

Suppose not. Then there exists a sequence (ci,xi)A(c_i, x_i) \in A with cicc_i \to c and xiUcx_i \notin U_c. Since AA is compact, passing to a subsequence, (ci,xi)(c,x)(c_i, x_i) \to (c, x) for some (c,x)A(c,x) \in A. In particular xAcx \in A_c. But UcU_c is open and xiUcx_i \notin U_c for all ii, so xUcx \notin U_c, contradicting AcUcA_c \subseteq U_c. This proves the claim.

Since {Jc}c[a,b]\{J_c\}_{c \in [a,b]} is an open cover of the compact set [a,b][a,b], it admits a finite subcover {Jc1,,Jcm}\{J_{c_1}, \dots, J_{c_m}\}. If necessary, we can shrink overlapping parts so that kJck2(ba)\sum_k |J_{c_k}| \leq 2(b-a). Then

Ak=1mJck×Uck=k=1mJckUck2(ba)ε.\begin{aligned} |A| &\leq \sum_{k=1}^m |J_{c_k} \times U_{c_k}| \\ &= \sum_{k=1}^m |J_{c_k}| \cdot |U_{c_k}| \\ &\leq 2(b-a)\varepsilon. \end{aligned}

Since ε\varepsilon is arbitrary, A=0|A| = 0.

Lemma

Every proper affine subspace of Rn\mathbb{R}^n has Lebesgue measure zero in Rn\mathbb{R}^n.

Proof.

Let VV be a proper affine subspace of Rn\mathbb{R}^n. If VV is empty, it trivially has measure zero. Otherwise, there exist a=(a1,,an)0a = (a_1, \dots, a_n) \neq 0 and bRb \in \mathbb{R} such that

V={xRnax=b}.V = \{x \in \mathbb{R}^n \mid a \cdot x = b\}.

Fix 1in1 \leq i \leq n such that ai0a_i \neq 0. For any xVx \in V,

xi=1ai(bjiajxj)=:F(x1,,xi1,xi+1,,xn).\begin{aligned} x_i &= \frac{1}{a_i}\left(b - \sum_{j \neq i} a_j x_j\right) \\ &=: F(x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n). \end{aligned}

So VV is the graph of the continuous function F:Rn1RF: \mathbb{R}^{n-1} \to \mathbb{R}. We apply the previous lemma: for any cRc \in \mathbb{R}, the slice V({c}×Rn1)V \cap (\{c\} \times \mathbb{R}^{n-1}) consists of at most one point (since xix_i is uniquely determined by the remaining coordinates), which has (n1)(n-1)-dimensional measure zero. Since VV is closed and every closed set is a countable union of compact sets, and each compact slice has (n1)(n-1)-measure zero, it follows that VV has Lebesgue measure zero.

There is an even more structural interpretation. The Lebesgue measure E|E| is translation invariant:

E+x=E.|E+x|=|E|.

If AGL(n,R)A\in GL(n,\mathbb{R}) and we define

ν(E)=AE,\nu(E)=|AE|,

then ν\nu is again a translation-invariant measure on Rn\mathbb{R}^n. Hence ν\nu should be a constant multiple of Lebesgue measure:

ν(E)=cE.\nu(E)=c|E|.

The constant is determined by evaluating both measures on the unit cube:

c=ν([0,1)n)=A[0,1)n=detA.c = \nu([0,1)^n) = |A[0,1)^n| = |\det A|.

Thus,

AE=detAE.|AE|=|\det A|\,|E|.

This viewpoint is closely related to Haar measure. On the additive group (Rn,+)(\mathbb{R}^n,+), Lebesgue measure is the canonical translation-invariant measure, unique up to multiplication by a positive constant.

We now state the main theorem again and prove it. Let Φ:RnRn\Phi: \mathbb{R}^n \to \mathbb{R}^n be an affine map Φ(x)=Ax+b\Phi(x) = Ax + b, where AMn(R)A \in M_n(\mathbb{R}) and bRnb \in \mathbb{R}^n. Then

Φ(U)=det(A)Ufor all URn.|\Phi(U)| = |\det(A)|\,|U| \qquad \text{for all } U \subseteq \mathbb{R}^n.
Proof.

Reduction. Since Φ(U)=A(U)+b\Phi(U) = A(U) + b and outer Lebesgue measure is translation invariant, A(U)+b=A(U)|A(U) + b| = |A(U)|. So it suffices to prove A(U)=det(A)U|A(U)| = |\det(A)|\,|U| for all URnU \subseteq \mathbb{R}^n.

Case 1: det(A)=0\det(A) = 0. Then im(A)\operatorname{im}(A) is a proper affine subspace of Rn\mathbb{R}^n, which has measure zero by the previous lemma. Since A(U)im(A)A(U) \subseteq \operatorname{im}(A), monotonicity gives A(U)=0=det(A)U|A(U)| = 0 = |\det(A)|\,|U|.

Case 2: det(A)0\det(A) \neq 0. Then AA is invertible. We proceed in five steps.

Step 1: Define a candidate measure ν\nu. Let ν:B(Rn)[0,+]\nu: \mathcal{B}(\mathbb{R}^n) \to [0, +\infty] be defined by ν(U)=A(U)\nu(U) = |A(U)|. Clearly ν()=0\nu(\varnothing) = 0. For any countable disjoint collection {Ui}B(Rn)\{U_i\} \subseteq \mathcal{B}(\mathbb{R}^n), since AA is a bijection,

ν ⁣(iUi)=iA(Ui)=iA(Ui)=iν(Ui).\begin{aligned} \nu\!\left(\bigsqcup_i U_i\right) &= \left|\bigsqcup_i A(U_i)\right| \\ &= \sum_i |A(U_i)| \\ &= \sum_i \nu(U_i). \end{aligned}

Hence ν\nu is a measure on B(Rn)\mathcal{B}(\mathbb{R}^n).

Step 2: ν\nu is translation invariant. For any UB(Rn)U \in \mathcal{B}(\mathbb{R}^n) and xRnx \in \mathbb{R}^n, since AA is linear,

ν(U+x)=A(U+x)=A(U)+Ax=A(U)=ν(U),\begin{aligned} \nu(U + x) &= |A(U + x)| \\ &= |A(U) + Ax| \\ &= |A(U)| \\ &= \nu(U), \end{aligned}

where the middle equality uses translation invariance of outer measure.

Step 3: ν\nu agrees with detA|\det A|\,|\cdot| on half-open cubes.

For any half-open cube Q=i=1n[ai,bi)Q = \prod_{i=1}^n [a_i, b_i) of side length ss, write Q=a+s[0,1)nQ = a + s \cdot [0,1)^n. By Step 2, ν(Q)=ν(s[0,1)n)\nu(Q) = \nu(s \cdot [0,1)^n). Tile [0,1)n[0,1)^n by mnm^n disjoint half-open cubes {Qj}\{Q_j\} of side 1m\frac{1}{m}. By translation invariance, all ν(Qj)\nu(Q_j) are equal, so ν([0,1)n)=mnν ⁣([0,1m)n)\nu([0,1)^n) = m^n \,\nu\!\left(\left[0,\tfrac{1}{m}\right)^n\right). A scaling argument gives ν([0,s)n)=snν([0,1)n)\nu([0,s)^n) = s^n\,\nu([0,1)^n) for rational ss, and monotonicity extends this to all s>0s > 0. The image A([0,1)n)A([0,1)^n) is a parallelepiped whose volume is detA|\det A| by the geometric interpretation of the determinant. Hence

ν(Q)=sndetA=detAQ.\nu(Q) = s^n\,|\det A| = |\det A|\,|Q|.

Step 4: Conclude ν(U)=detAU\nu(U) = |\det A|\,|U| on B(Rn)\mathcal{B}(\mathbb{R}^n). Both ν\nu and UdetAUU\mapsto|\det A|\,|U| are σ\sigma-finite Borel measures that agree on all half-open cubes. Half-open cubes form a π\pi-system generating B(Rn)\mathcal{B}(\mathbb{R}^n). Define

D={UB(Rn)ν(U)=detAU}.\mathcal{D} = \{U \in \mathcal{B}(\mathbb{R}^n) \mid \nu(U) = |\det A|\,|U|\}.

One checks that D\mathcal{D} is a λ\lambda-system: RnD\mathbb{R}^n \in \mathcal{D} by Step 3; if UVU \subseteq V are in D\mathcal{D} and both have finite measure then VUDV \setminus U \in \mathcal{D} by additivity; and D\mathcal{D} is closed under increasing unions by monotone convergence. Since D\mathcal{D} contains the π\pi-system of half-open cubes, the π\pi-λ\lambda theorem gives B(Rn)=σ(half-open cubes)D\mathcal{B}(\mathbb{R}^n) = \sigma(\text{half-open cubes}) \subseteq \mathcal{D}. Hence

ν(U)=detAUfor all UB(Rn).\nu(U) = |\det A|\,|U| \qquad \text{for all } U \in \mathcal{B}(\mathbb{R}^n).

Step 5: Extend to all subsets. For arbitrary URnU \subseteq \mathbb{R}^n,

U=inf{VVU, VB(Rn)}.|U| = \inf\{|V| \mid V \supseteq U,\ V \in \mathcal{B}(\mathbb{R}^n)\}.

Since AA is a bijection, AA maps Borel sets to Borel sets and UVA(U)A(V)U \subseteq V \Rightarrow A(U) \subseteq A(V). Therefore

A(U)=inf{A(V)VU, VB(Rn)}=inf{detAVVU, VB(Rn)}=detAU.\begin{aligned} |A(U)| &= \inf\{|A(V)| \mid V \supseteq U,\ V \in \mathcal{B}(\mathbb{R}^n)\} \\ &= \inf\{|\det A|\,|V| \mid V \supseteq U,\ V \in \mathcal{B}(\mathbb{R}^n)\} \\ &= |\det A|\,|U|. \end{aligned}

References

[1] J. Serra, Analysis II, ETH Zürich lecture notes.

[2] J. M. Lee, Introduction to Smooth Manifolds, 2nd ed., Grad. Texts in Math., vol. 218, Springer, New York, 2013.

[3] A.-L. Cauchy, “Mémoire sur les fonctions qui ne peuvent obtenir que deux valeurs égales et de signes contraires par suite des transpositions opérées entre les variables qu'elles renferment,” 1812.

[4] H. Lebesgue, Intégrale, longueur, aire, Annali di Matematica Pura ed Applicata, 1902.

[5] C. Carathéodory, Vorlesungen über reelle Funktionen, Teubner, 1918.

[6] A. Haar, “Der Massbegriff in der Theorie der kontinuierlichen Gruppen,” Annals of Mathematics, 34 (1933), 147–169.

[7] P. R. Halmos, Measure Theory, Springer, 1950.

[8] G. B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed., Wiley, 1999.

[9] G. W. Stewart, “On the Early History of the Singular Value Decomposition,” SIAM Review, 35(4), 1993, pp. 551–566.

[10] E. Beltrami, “Sulle funzioni bilineari,” Giornale di Matematiche ad Uso degli Studenti Delle Universita, 11, 1873, pp. 98–106.

[11] C. Jordan, “Mémoire sur les formes bilinéaires,” Journal de Mathématiques Pures et Appliquées, 19, 1874, pp. 35–54.

[12] C. Eckart and G. Young, “The approximation of one matrix by another of lower rank,” Psychometrika, 1, 1936, pp. 211–218.

[13] G. H. Golub and C. Reinsch, “Singular Value Decomposition and Least Squares Solutions,” Numerische Mathematik, 14, 1970, pp. 403–420.

2025-04-06

comments

loading...

leave a note